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FOCUS TRACKING IN DIALOGS 

CROSS-REFERENCE TO RELATED APPLICATIONS 

The present application relates to U.S. 
Patent Application entitled APPLICATION ABSTRACTION 
WITH DIALOG PURPOSE having Serial No. 10/087,608, filed 
5 October 21, 2001, and published as US 2003/0130854 and 
U.S. Patent Application entitled APPLICATION 
ABSTRACTION WITH DIALOG PURPOSE having Serial No. 
10/426,053, filed April 28, 2003, the contents of 
which are hereby incorporated be reference in their 
10 entirety. 

BACKGROUND OF THE INVENTION 
The present invention relates to access of 
information over a wide area network such as the 
Internet. More particularly, the present invention 

15 relates to web enabled recognition allowing information 
and control on a client side device to be entered using 
a variety of methods . 

Small computing devices such as personal 
information managers (PIM) , devices and portable phones 

2 0 are used with ever increasing frequency by people in 
their day-to-day activities. With 'the increase in 
processing power now available for microprocessors used 
to run these devices, the functionality of these 
devices are increasing, and in some cases, merging. For 

25 instance, many portable phones now can be used to 
access and browse the Internet as well as can be used 
to store personal information such as addresses, phone 
numbers and the like. 

In view that these computing devices are 

30 being used for browsing the Internet, or are used in 
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other server/client architectures, it is therefore 
necessary to enter information into the computing 
device. Unfortunately, due to the desire to keep these 
devices as small as possible in order that they are 
5 easily carried, conventional keyboards having all the 
letters of the alphabet as isolated buttons are usually 
not possible due to the limited surface area available 
on the housings of the computing devices. 

To address this problem, there has been 

10 increased interest and adoption of using voice or 
speech to provide and access such information, 
particularly over a wide area network such as the 
Internet. Published U.S. Patent Application, US 
2003/0130854, entitled APPLICATION ABSTRACTION WITH 

15 DIALOG PURPOSE and U.S. Patent Application entitled 
APPLICATION ABSTRACTION WITH DIALOG PURPOSE having 
Serial No. 10/426,053 and filed April 28, 2003 describe 
a method and system defining controls for a web server 
to generate client side markups that include 

2 0 recognition and/or audible prompting. 

Each of the controls perform a role in the 
dialog. For instance, controls can include prompt 
object used to generate corresponding markup for the 
client device to present information to the user, or 

2 5 generate markups for the client device to ask a 

question. An answer control or object generates markup 
for the client device so that a grammar used for 
recognition is associated with an input field related 
to a question that has been asked. If it is unclear 

3 0 whether or not a recognized result is correct, a 
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confirmation mechanism can be activated and generate 
markup to confirm a recognized result. A command 
control generates markup that allows the user to 
provide commands, which are other than the expected 
5 answers to a specific question, and thus, allows the 
user to navigate through the web server application, 
for example. A module, when executed such as on a 
client, creates a dialog to solicit and provide 
information as a function of the controls. 

10 The module can use a control mechanism that 

identifies an order for the dialog, for example, an 
order for asking questions. The controls include 
activation logic that may activate other controls based 
on the answer given by the user. In many cases, the 

15 controls specify and allow the user to provide extra 
answers, which are commonly answers to questions yet to 
be asked, and thereby, cause the system to skip such 
questions since such answers have already been 
provided. This type of dialog is referred to as 

2 0 "mixed- initiative" since the system and the user have 
some control of dialog flow. 

However, when users are allowed to provide 
many pieces of information in one sentence, it becomes 
difficult to ensure that the system will respond 

25 appropriately. For example, suppose a system asks a 
user for a phone number. In this example, the phone 
number includes an area code, a local number and an 
extension. In a mixed-initiative dialog, the user could 
provide the full number or just a portion of it. The 

30 system may need to confirm portions of the number that 
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have been given and would need to ask for the remaining 
portions of the number. If the user denies or corrects 
a portion that the system misunderstood, the system 
would need to ask it again. Ideally, the system would 
5 make sure to always confirm or ask a question about the 
portions of the number that the user just provided. In 
contrast, if the system were to confirm or ask a 
question about another, portion of the number, the 
dialog would seem confusing and hard to follow. Given 

10 the large number of possible dialog flows, which can be 
based on the number of permutations due to the number 
of extra answers that can be provided, a logical dialog 
flow is difficult to achieve. In some cases, the system 
may follow a hard- coded path through the dialog and 

15 appears from the user's point-of -view, to ignore the 
information it was given. However, it is usually 
processed later, which can further add to the 
confusion. 

There is thus an ongoing need to improve upon the 
2 0 methods used to provide speech recognition in an 

application such as server/client architecture such as 
the Internet. In particular, a method/ system or 
authoring tool that addresses one, several or all of 
the foregoing disadvantages and thus provides 
25 generation of speech- enabled recognition and/or speech- 
enabled prompting in an application is needed. SUMMARY 

OF THE INVENTION 
Controls are provided for a web server to 
generate client side markups that include recognition 
30 and/or audible prompting. The controls comprise 
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elements of a dialog such as a question, answer, 
confirmation, command or statement. A module forms a 
dialog by making use of the information carried in the 
controls . 

5 Each of the controls perform a role in the 

dialog. For instance, controls can include prompt 
object used to generate corresponding markup for the 
client device to present information to the user, or 
generate markups for the client device to ask a 

10 question. An answer control or object generates 
markup for the client device so that a grammar used 
for recognition is associated with an input field 
related to a question that has been asked. If it is 
unclear whether or not a recognized result is 

15 correct, a confirmation mechanism can be activated 
and generate markup to confirm a recognized result. A 
module, when executed such as on a client, creates a 
dialog to solicit and provide information as a function 
of the controls. 

20 An aspect of the present invention is to allow 

the system to automatically adapt the dialogue flow 
so that it stays focused on the user's most recent 
inputs [rli] . Generally, whenever recognition results 
are received, this information is retained in a 

25 manner so as to provide an order indicating the 
relative order it was received. In this manner, the 
most recently recognition results can be identified. 
In one embodiment, memory is used in the form of a 
"stack". The stack comprises identifiers related to 

30 recognition results received. When the dialog is 
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created, it looks for controls related to the 
recognition results identified at the top of the 
stack, for example, whether this recognition result 
needs to be confirmed. Although the controls 

5 typically include means such as an attribute to 
indicate a selected order for execution, this means 
that controls later in the selected order than others 
can be "promoted" and run before them, provided that 
they are related to the top-most of the stack whereas 
10 the others are not. 

BRIEF DESCRIPTION OF THE DRAWINGS 
FIG. 1 is a plan view of a first embodiment 
of a computing device operating environment. 

FIG. 2 is a block diagram of the computing 
device of FIG. 1. 

FIG. 3 is a block diagram of a general 
purpose computer . 

FIG. 4 is a block diagram of an architecture 
for a client /server system. 

FIG. 5 is a display for obtaining credit 
card information. 

FIG. 6 is a block diagram illustrating a 
25 first approach for providing recognition and audible 
prompting in client side markups. 

FIG. 7 is a block diagram illustrating a 
second approach for providing recognition and audible 
prompting in client side markups. 
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FIG. 8 is a block diagram illustrating a 
third approach for providing recognition and audible 
prompting in client side markups. 

FIG. 9 is a block diagram illustrating 
5 companion controls. 

FIG. 10 is a detailed block diagram 
illustrating companion controls of a first embodiment. 

FIG. 11 is a block diagram illustrating 
companion controls of a second embodiment. 
10 FIG. 12 is a block diagram illustrating 

speech controls inheritance for the second embodiment. 

FIG. 13 is a pictorial representation of a 
stack used for focussing dialogue. 

FIG. 14 is a method for comparing a 
15 Semanticltem on the focus stack with answers or 
confirms related to a QA. [rl2] 

FIG. 15 is a pictorial representation of 
information to be gathered organized as "topics" . 

FIG. 16 illustrates an exemplary display 
2 0 rendering for a travel page. 

DETAILED DESCRIPTION OF THE ILLUSTRATIVE EMBODIMENTS 

Before describing architecture of web based 
recognition and methods for implementing the same, it 
may be useful to describe generally computing devices 
25 that can function in the architecture. Referring now to 
FIG. 1, an exemplary form of a data management device 
(PIM, PDA or the like) is illustrated at 30. However, 
it is contemplated that the present invention can also 
be practiced using other computing devices discussed 
30 below, and in particular, those computing devices 
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having limited surface areas for input buttons or the 
like. For example, phones and/or data management 
devices will also benefit from the present invention. 
Such devices will have an enhanced utility compared to 
5 existing portable personal information management 
devices and other portable electronic devices, and the 
functions and compact size of such devices will more 
likely encourage the user to carry the device at all 
times. Accordingly, it is not intended that the scope 

10 of the architecture herein described be limited by the 
disclosure of an exemplary data management or PIM 
device, phone or computer herein illustrated. 

An exemplary form of a data management 
mobile device 30 is illustrated in FIG. 1. The mobile 

15 device 30 includes a housing 32 and has an user 
interface including a display 34, which uses a contact 
sensitive display screen in conjunction with a stylus 
33. The stylus 33 is used to press or contact the 
display 34 at designated coordinates to select a field, 

20 to selectively move a starting position of a cursor, or 
to otherwise provide command information such as 
through gestures or handwriting. Alternatively, or in 
addition, one or more buttons 35 can be included on the 
device 30 for navigation. In addition, other input 

2 5 mechanisms such as rotatable wheels, rollers or the 
like can also be provided. However, it should be noted 
that the invention is not intended to be limited by 
these forms of input mechanisms. For instance, another 
form of input can include a visual input such as 

30 through computer vision. 
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Referring now to FIG. 2, a block diagram 
illustrates the functional components comprising the 
mobile device 30. A central processing unit (CPU) 50 
implements the software control functions. CPU 50 is 
5 coupled to display 34 so that text and graphic icons 
generated in accordance with the controlling software 
appear on the display 34. A speaker 43 can be coupled 
to CPU 50 typically with a digital-to-analog converter 
59 to provide an audible output. Data that is 

10 downloaded or entered by the user into the mobile 
device 30 is stored in a non-volatile read/write random 
access memory store 54 bi-directionally coupled to the 
CPU 50. Random access memory (RAM) 54 provides volatile 
storage for instructions that are executed by CPU 50, 

15 and storage for temporary data, such as register 
values. Default values for configuration options and 
other variables are stored in a read only memory (ROM) 
58. ROM 58 can also be used to store the operating 
system software for the device that controls the basic 

20 functionality of the mobile 30 and other operating 
system kernel functions (e.g., the loading of software 
components into RAM 54) . 

RAM 54 also serves as a storage for the code 
in the manner analogous to the function of a hard drive 

25 on a PC that is used to store application programs. It 
should be noted that although non-volatile memory is 
used for storing the code, it alternatively can be 
stored in volatile memory that is not used for 
execution of the code. 
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Wireless signals can be transmitted/received 
by the mobile device through a wireless transceiver 52, 
which is coupled to CPU 50. An optional communication 
interface 60 can also be provided for downloading data 
5 directly from a computer (e.g., desktop computer), or 
from a wired network, if desired. Accordingly, 
interface 60 can comprise various forms of 
communication devices, for example, an infrared link, 
modem, a network card, or the like. 

10 Mobile device 3 0 includes a microphone 29, 

and analog-to-digital (A/D) converter 37, and an 
optional recognition program (speech, DTMF, 
handwriting, gesture or computer vision) stored in 
store 54. By way of example, in response to audible 

15 information, instructions or commands from a user of 
device 30, microphone 29 provides speech signals, which 
are digitized by A/D converter 37. The speech 
recognition program can perform normalization and/or 
feature extraction functions on the digitized speech 

2 0 signals to obtain intermediate speech recognition 
results. Using wireless transceiver 52 or communication 
interface 60, speech data is transmitted to a remote 
recognition server 2 04 discussed below and illustrated 
in the architecture of FIG. 4. Recognition results are 

25 then returned to mobile device 3 0 for rendering (e.g. 
visual and/or audible) thereon, and eventual 
transmission to a web server 202 (FIG. 5) , wherein the 
web server 2 02 and mobile device 30 operate in a 
client/server relationship. Similar processing can be 

30 used for other forms of input. For example, handwriting 
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input can be digitized with or without pre-processing 
on device 30. Like the speech data, this form of input 
can be transmitted to the recognition server 204 for 
recognition wherein the recognition results are 
5 returned to at least one of the device 30 and/or web 
server 202. Likewise, DTMF data, gesture data and 
visual data can be processed similarly. Depending on 
the form of input, device 30 (and the other forms of 
clients discussed below) would include necessary 

10 hardware such as a camera for visual input. 

In addition to the portable or mobile 
computing devices described above, it should also be 
understood that the present invention can be used with 
numerous other computing devices such as a general 

15 desktop computer. For instance, the present invention 
will allow a user with limited physical abilities to 
input or enter text into a computer or other computing 
device when other conventional input devices, such as a 
full alpha-numeric keyboard, are too difficult to 

2 0 operate. 

The invention is also operational with 
numerous other general purpose or special purpose 
computing systems, environments or configurations. 
Examples of well known computing systems, environments, 

25 and/or configurations that may be suitable for use with 
the invention include, but are not limited to, wireless 
or cellular telephones, regular telephones (without any 
screen) , personal computers, server computers, hand- 
held or laptop devices, multiprocessor systems, 

30 microprocessor-based systems, set top boxes, 
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programmable consumer electronics, network PCs, 
minicomputers, mainframe computers, distributed 
computing environments that include any of the above 
systems or devices, and the like. 
5 The following is a brief description of a 

general purpose computer 120 illustrated in FIG. 3. 
However, the computer 120 is again only one example of 
a suitable computing environment and is not intended to 
suggest any limitation as to the scope of use or 

10 functionality of the invention. Neither should the 
computer 120 be interpreted as having any dependency or 
requirement relating to any one or combination of 
components illustrated therein. 

The invention may be described in the 

15 general context of computer-executable instructions, 
such as program modules, being executed by a 
computer. Generally, program modules include 
routines, programs, objects, components, data 
structures, etc. that perform particular tasks or 

2 0 implement particular abstract data types. The 
invention may also be practiced in distributed 
computing environments where tasks are performed by 
remote processing devices that are linked through a 
communications network. In a distributed computing 

25 environment, program modules may be located in both 
local and remote computer storage media including 
memory storage devices . Tasks performed by the 
programs and modules are described below and with the 
aid of figures. Those skilled in the art can 

30 implement the description and figures as processor 



executable instructions, which can be written on any 
form of a computer readable medium. 

With reference to FIG. 3, components of 
computer 120 may include, but are not limited to, a 
5 processing unit 14 0, a system memory 150, and a 
system bus 141 that couples various system components 
including the system memory to the processing unit 
14 0. The system bus 141 may be any of several types 
of bus structures including a memory bus or memory 

10 controller, a peripheral bus, and a local bus using 
any of a variety of bus architectures. By way of 
example, and not limitation, such architectures 
include Industry Standard Architecture (ISA) bus, 
Universal Serial Bus (USB) , Micro Channel 

15 Architecture (MCA) bus, Enhanced ISA (EISA) bus, 
Video Electronics Standards Association (VESA) local 
bus, and Peripheral Component Interconnect (PCI) bus 
also known as Mezzanine bus. Computer 120 typically 
includes a variety of computer readable mediums . 

2 0 Computer readable mediums can be any available media 
that can be accessed by computer 12 0 and includes 
both volatile and nonvolatile media, removable and 
non-removable media. By way of example, and not 
limitation, computer readable mediums may comprise 

2 5 computer storage media and communication media. 
Computer storage media includes both volatile and 
nonvolatile, removable and non-removable media 
implemented in any method or technology for storage 
of information such as computer readable 

30 instructions, data structures, program modules or 
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other data. Computer storage media includes, but is 
not limited to, RAM, ROM, EE PROM, flash memory or 
other memory technology, CD-ROM, digital versatile 
disks (DVD) or other optical disk storage, magnetic 
5 cassettes, magnetic tape, magnetic disk storage or 
other magnetic storage devices, or any other medium 
which can be used to store the desired information 
and which can be accessed by computer 120. 

Communication media typically embodies 

10 computer readable instructions, data structures, 
program modules or other data in a modulated data 
signal such as a carrier wave or other transport 
mechanism and includes any information delivery 
media. The term "modulated data signal" means a 

15 signal that has one or more of its characteristics 
set or changed in such a manner as to encode 
information in the signal. By way of example, and 
not limitation, communication media includes wired 
media such as a wired network or direct -wired 

20 connection, and wireless media such as acoustic, FR, 
infrared and other wireless media. Combinations of 
any of the above should also be included within the 
scope of computer readable media. 

The system memory 150 includes computer 

25 storage media in the form of volatile and/or 
nonvolatile memory such as read only memory (ROM) 151 
and random access memory (RAM) 152. A basic 
input/output system 153 (BIOS) , containing the basic 
routines that help to transfer information between 

30 elements within computer 120, such as during start- 
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up, is typically stored in ROM 151. RAM 152 
typically contains data and/or program modules that 
are immediately accessible to and/or presently being 
operated on by processing unit 140. By way of 
5 example, and not limitation, FIG. 3 illustrates 
operating system 54, application programs 155, other 
program modules 156, and program data 157. 

The computer 120 may also include other 
removabl e / non - removabl e vol at i 1 e /nonvol a t i 1 e comput er 

10 storage media. By way of example only, FIG. 3 
illustrates a hard disk drive 161 that reads from or 
writes to non -removable, nonvolatile magnetic media, 
a magnetic disk drive 171 that reads from or writes 
to a removable, nonvolatile magnetic disk 172, and an 

15 optical disk drive 175 that reads from or writes to a 
removable, nonvolatile optical disk 176 such as a CD 
ROM or other optical media. Other removable /non- 
removable, volatile/nonvolatile computer storage 
media that can be used in the exemplary operating 

2 0 environment include, but are not limited to, magnetic 
tape cassettes, flash memory cards, digital versatile 
disks, digital video tape, solid state RAM, solid 
state ROM, and the like. The hard disk drive 161 is 
typically connected to the system bus 141 through a 

25 non-removable memory interface such as interface 160, 
and magnetic disk drive 171 and optical disk drive 
175 are typically connected to the system bus 141 by 
a removable memory interface, such as interface 170. 

The drives and their associated computer 

30 storage media discussed above and illustrated in FIG. 
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3, provide storage of computer readable instructions, 
data structures, program modules and other data for 
the computer 12 0. In FIG. 3, for example, hard disk 
drive 161 is illustrated as storing operating system 
5 164, application programs 165, other program modules 
166, and program data 167. Note that these components 
can either be the same as or different from operating 
system 154, application programs 155, other program 
modules 156, and program data 157. Operating system 

10 164, application programs 165, other program modules 
166, and program data 167 are given different numbers 
here to illustrate that, at a minimum, they are 
different copies. 

A user may enter commands and information 

15 into the computer 12 0 through input devices such as a 
keyboard 182, a microphone 183, and a pointing device 
181, such as a mouse, trackball or touch pad. Other 
input devices (not shown) may include a joystick, 
game pad, satellite dish, scanner, or the like. 

2 0 These and other input devices are often connected to 
the processing unit 14 0 through a user input 
interface 180 that is coupled to the system bus, but 
may be connected by other interface and bus 
structures, such as a parallel port, game port or a 

25 universal serial bus (USB) . A monitor 184 or other 
type of display device is also connected to the 
system bus 141 via an interface, such as a video 
interface 185. In addition to the monitor, computers 
may also include other peripheral output devices such 
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as speakers 187 and printer 186, which may be 
connected through an output peripheral interface 188. 

The computer 12 0 may operate in a networked 
environment using logical connections to one or more 
5 remote computers, such as a remote computer 194. The 
remote computer 194 may be a personal computer, a 
hand-held device, a server, a router, a network PC, a 
peer device or other common network node, and 
typically includes many or all of the elements 

10 described above relative to the computer 120. The 
logical connections depicted in FIG. 3 include a 
local area network (LAN) 191 and a wide area network 
(WAN) 193, but may also include other networks. Such 
networking environments are commonplace in offices, 

15 enterprise-wide computer networks, intranets and the 
Internet . 

i 

I When used in a LAN networking environment, 

■ the computer 12 0 is connected to the LAN 191 through 

; a network interface or adapter 190. When used in a 

! 2 0 WAN networking environment, the computer 12 0 

typically includes a modem 192 or other means for 
establishing communications over the WAN 193, such as 
the Internet. The modem 192, which may be internal or 
external, may be connected to the system bus 141 via 
25 the user input interface 180, or other appropriate 
mechanism. In a networked environment, program 
modules depicted relative to the computer 120, or 
portions thereof, may be stored in the remote memory 
storage device. By way of example, and not 
30 limitation, FIG. 3 illustrates remote application 
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programs 195 as residing on remote computer 194. It 
will be appreciated that the network connections 
shown are exemplary and other means of establishing a 
communications link between the computers may be 
5 used. 

EXEMPLARY ARCHITECTURE 
FIG. 4 illustrates architecture 200 for web 
based recognition as can be used with the present 
invention. Generally, information stored in a web 

10 server 2 02 can be accessed through mobile device 3 0 
(which herein also represents other forms of 
computing devices having a display screen, a 
microphone, a camera, a touch sensitive panel, etc., 
as required based on the form of input) , or through 

15 phone 80 wherein information is requested audibly or 
through tones generated by phone 80 in response to 
keys depressed and wherein information from web 
server 202 is provided only audibly back to the user. 

In this exemplary embodiment, architecture 

20 200 is unified in that whether information is 
obtained through device 30 or phone 80 using speech 
recognition, a single recognition server 204 can 
support either mode of operation. In addition, 
architecture 2 00 operates using an extension of well- 

25 known markup languages (e.g. HTML, XHTML, cHTML, XML, 
WML, and the like) . Thus, information stored on web 
server 202 can also be accessed using well-known GUI 
methods found in these markup languages. By using an 
extension of well-known markup languages, authoring 

30 on the web server 202 is easier, and legacy 
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applications currently existing can be also easily 
modified to include voice or other forms of 
recognition. 

Generally, device 3 0 executes HTML+ 
5 scripts, or the like, provided by web server 202. 
When voice recognition is required, by way of 
example, speech data, which can be digitized audio 
signals or speech features wherein the audio signals 
have been preprocessed by device 3 0 as discussed 

10 above, are provided to recognition server 204 with an 
indication of a grammar or language model to use 
during speech recognition. The implementation of the 
recognition server 2 04 can take many forms, one of 
which is illustrated, but generally includes a 

15 recognizer 211. The results of recognition are 
provided back to device 3 0 for local rendering if 
desired or appropriate. Upon compilation of 
information through recognition and any graphical 
user interface if used, device 3 0 sends the 

20 information to web server 202 for further processing 
and receipt of further HTML scripts, if necessary. 

As illustrated in FIG. 4, device 30, web 
server 202 and recognition server 204 are commonly 
connected, and separately addressable, through a 

2 5 network 2 05, herein a wide area network such as the 

Internet. It therefore is not necessary that any of 
these devices be physically located adjacent to each 
other. In particular, it is not necessary that web 
server 202 includes recognition server 204. In this 

3 0 manner, authoring at web server 2 02 can be focused on 
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the application to which it is intended without the 
authors needing to know the intricacies of 
recognition server 204. Rather, recognition server 
2 04 can be independently designed and connected to 
5 the network- 205, and thereby, be updated and improved 
without further changes required at web server 202. 
As discussed below, web server 2 02 can also include 
an authoring mechanism that can dynamically generate 
client-side markups and scripts. In a further 

10 embodiment, the web server 202, recognition server 
2 04 and client 30 may be combined depending on the 
capabilities of the implementing machines. For 
instance, if the client comprises a general purpose 
computer, e.g. a personal computer, the client may 

15 include the recognition server 204. Likewise, if 
desired, the web server 202 and recognition server 
204 can be incorporated into a single machine. 

Access to web server 2 02 through phone 80 
includes connection of phone 80 to a wired or 

20 wireless telephone network 208, that in turn, 
connects phone 80 to a third party gateway 210. 
Gateway 210 connects phone 80 to a telephony voice 
browser 212. Telephone voice browser 212 includes a 
media server 214 that provides a telephony interface 

25 and a voice browser 216. Like device 30, telephony 
voice browser 212 receives HTML scripts or the like 
from web server 202. In one embodiment, the HTML 
scripts are of the form similar to HTML scripts 
provided to device 30. In this manner, web server 202 

3 0 need not support device 3 0 and phone 80 separately, 
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or even support standard GUI clients separately. 
Rather, a common markup language can be used. In 
addition, like device 30, voice recognition from 
audible signals transmitted by phone 80 are provided 
5 from voice browser 216 to recognition server 204, 
either through the network 2 05, or through a 
dedicated line 207, for example, using TCP/IP. Web 
server 202, recognition server 204 and telephone 
voice browser 212 can be embodied in any suitable 

10 computing environment such as the general purpose 
desktop computer illustrated in FIG. 3. 

However, it should be noted that if DTMF 
recognition is employed, this form of recognition 
would generally be performed at the media server 214, 

15 rather than at the recognition server 2 04. In other 
words, the DTMF grammar would be used by the media 
server 214. 

Referring back to FIG. 4, web server 202 
can include a server side plug- in authoring tool or 

20 module 209 (e.g. ASP, ASP+, ASP. Net by Microsoft 
Corporation, JSP, Javabeans, or the like) . Server 
side plug- in module 2 09 can dynamically generate 
client-side markups and even a specific form of 
markup for the type of client accessing the web 

25 server 202. The client information can be provided to 
the web server 202 upon initial establishment of the 
client/server relationship, or the web server 202 can 
include modules or routines to detect the 
capabilities of the client device. In this manner, 

30 server side plug-in module 209 can generate a client 
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side markup for each of the voice recognition 
scenarios, i.e. voice only through phone 80 or 
multimodal for device 30. By using a consistent 
client side model, application authoring for many 
5 different clients is significantly easier. 

In addition to dynamically generating 
client side markups, high-level dialog modules, 
discussed below, can be implemented as a server-side 
control stored in store 211 for use by developers in 

10 application authoring. In general, the high-level 
dialog modules 211 would generate dynamically client- 
side markup and script in both voice-only and 
multimodal scenarios based on parameters specified by 
developers. The high-level dialog modules 211 can 

15 include parameters to generate client-side markups to 
fit the developers' needs. 

EXEMPLARY CLIENT SIDE EXTENSIONS 
Before describing further aspect of the 
present invention, it may be helpful to first discuss 

20 an exemplary form of extensions to the markup 
language for use in web based recognition. 

As indicated above, the markup languages 
such as HTML, XHTML cHTML, XML, WML or any other 
SGML-derived markup, which are used for interaction 

25 between the web server 202 and the client device 30 
and phone 80, are extended to include controls and/or 
objects that provide recognition in a client/server 
architecture. Generally, controls and/or objects can 
include one or more of the following functions: 

30 recognizer controls and/or objects for recognizer 



configuration, recognizer execution and/or post- 
processing; synthesizer controls and/or objects for 
synthesizer configuration and prompt playing; grammar 
controls and/or objects for specifying input grammar 
5 resources; and/or binding controls and/or objects for 
processing recognition results. The extensions are 
designed to be a lightweight markup layer, which adds 
the power of an audible, visual, handwriting, etc. 
interface to existing markup languages. As such, the 

10 extensions can remain independent of: the high-level 
page in which they are contained, e.g. HTML; the low- 
level formats which the extensions used to refer to 
linguistic resources, e.g. the text-to-speech and 
grammar formats; and the individual properties of the 

15 recognition and speech synthesis platforms used in 
the recognition server 204. 

It should be noted, a markup language 
extension such as speech application language tags 
(SALT) can be used. SALT is a developing standard for 

2 0 enabling access to information, applications and web 
services from personal computers, telephones, tablet 
PCs and wireless mobile devices, for example. SALT 
extends existing markup languages such as HTML, XHTML 
and XML. An example of the SALT specification can be 

25 found in Published U.S. Patent Application, US 
2003/0130854, entitled APPLICATION ABSTRACTION WITH 
DIALOG PURPOSE, which is herein incorporated by 
reference in its entirety. The SALT specification may 
be found online at http://www.SALTforum.org. Further 
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details regarding the extensions are not necessary 
for understanding the present invention. 

Although speech recognition will be 
discussed below, it should be understood that the 
5 techniques, tags and server side controls described 
hereinafter can be similarly applied in handwriting 
recognition, gesture recognition and image 
recognition. 

At this point though, a particular mode of 

10 entry should be discussed. In particular, use of 
speech recognition in conjunction with at least a 
display and, in a further embodiment, a pointing 
device as well which enables the coordination of 
multiple modes of input, e.g. to indicate the fields 

15 for data entry, is particularly useful. Specifically, 
in this mode of data entry, the user is generally 
able to coordinate the actions of the pointing device 
with the speech input, so for example the user is 
under control of when to select a field and provide 

2 0 corresponding information relevant to the field. For 
instance, a credit card submission graphical user 
interface (GUI) is illustrated in FIG. 5, a user 
could first decide to enter the credit card number in 
field 2 52 and then enter the type of credit card in 

25 field 250 followed by the expiration date in field 
254. Likewise, the user could return back to field 
252 and correct an errant entry, if desired. When 
combined with speech recognition, an easy and natural 
form of navigation is provided. As used herein, this 

30 form of entry using both a screen display allowing 
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free form actions of the pointing device on the 
screen, e.g. the selection of fields and recognition 
is called "multimodal" . When rendered using the phone 
80 in a voice-only application, the user would be 
5 prompted to provide the information illustrated in 
FIG. 5. 

GENERATION OF CLIENT SIDE MARKUPS 
As indicated above, server side plug- in 
module 2 09 outputs client side markups when a request 

10 has been made from the client device 3 0 or telephony 
voice browser 212. Although possibly described below 
with respect to the client device, it should be 
understood that the telephony voice browser 212 is 
inferred as an example device for voice-only 

15 applications. In short, the server side plug- in 
module 209 allows the website, and thus, the 
application and services provided by the application 
to be defined or constructed. The instructions in the 
server side plug-in module 209 are made of a complied 

2 0 code. The code is run when a web request reaches the 
web server 202. The server side plug-in module 209 
then outputs a new client side markup page that is 
sent to the client device 3 0 or telephony voice 
browser 212. As is well known, this process is 

2 5 commonly referred to as rendering. The server side 
plug-in module 2 09 operates on "controls" that 
abstract and encapsulate the markup language, and 
thus, the code of the client side markup page. Such 
controls that abstract and encapsulate the markup 

30 language and operate on the webserver 2 02 include or 
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are equivalent to "Servlets" or "Server- side plug 
ins" to name a few. 

As is known, server side plug- in modules of 
the prior art can generate client side markup for 
5 visual rendering and interaction with the client 
device 30. Three different approaches are provided 
herein for extending the server side plug- in module 
209 to include recognition and audible prompting 
extensions such as the exemplary client side 

10 extensions discussed above. In a first approach 
illustrated schematically in Fig. 6, the current, 
visual, server side controls (which include 
parameters for visual display such as location for 
rendering, font, foreground color, background color, 

15 etc.) are extended to include parameters or 
attributes for recognition and audibly prompting for 
related recognition. Using speech recognition and 
associated audible prompting by way of example, the 
attributes generally pertain to audible prompting 

20 parameters such as whether the prompt comprises 
inline text for text-to-speech conversion, playing of 
a prerecorded audio file (e.g. a wave file), the 
location of the data (text for text-to-speech 
conversion or a prerecorded audio file) for audible 

25 rendering, etc. For recognition, the parameters or 
attributes can include the location of the grammar to 
be used during recognition, confidence level 
thresholds, etc. Since the server side plug- in module 
2 09 generates client side markup, the parameters and 

3 0 attributes for the controls for the server side plug- 
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in module 209 relate to the extensions provided in 
the client side markup for recognition and/or audible 
prompting . 

The controls indicated at 300A in Fig. 6 
5 are controls, which are well-known in website 
application development or authoring tools such as 
ASP, ASP+, ASP . Net , JSP, Javabeans, or the like. Such 
controls are commonly formed in a library and used by 
controls 302 to perform a particular visual task. 

10 Library 300A includes methods for generating the 
desired client markup, event handlers, etc. Examples 
of visual controls 302 include a "Label" control that 
provides a selected text label on a visual display 
such as the label "Credit Card Submission" 304 in 

15 Fig. 5. Another example of a higher level visual 
control 302 is a "Textbox", which allows data to be 
entered in a data field such as is indicated at 250 
in Fig. 5. The existing visual controls 302 are also 
well-known. In the first approach for extending 

2 0 server side plug- in module controls to include 
recognition and/or audible prompting, each of the 
visual controls 3 02 would include further parameters 
or attributes related to recognition or audible 
prompting. In the case of the "label" control, which 

25 otherwise provides selected text on a visual display, 
further attributes may include whether an audio data 
file will be rendered or text -to- speech conversion 
will be employed as well as the location of this data 
file. A library 300B, similar to library 300A, 

30 includes further markup information for performing 



28 

recognition and/or audible prompting. Each of the 
visual controls 302 is coded so as to provide this 
information to the controls 300B as appropriate to 
perform the particular task related to recognition or 
5 audible prompting. 

As another example, the "Textbox" control, 
which generates an input field on a visual display 
and allows the user of the client device 30 to enter 
information, would also include appropriate 

10 recognition or audible prompting parameters or 
attributes such as the grammar to be used for 
recognition. It should be noted that the recognition 
or audible prompting parameters are optional and need 
not be used if recognition or audible prompting is 

15 not otherwise desired. 

In general, if a control at level 302 
includes parameters that pertain to visual aspects, 
the control will access and use the library 300A. 
Likewise, if the control includes parameters 

2 0 pertaining to recognition and/or audible prompting 
the control will access or use the library 300B. It 
should be noted that libraries 30 OA and 3 0 0B have 
been illustrated separately in order to emphasize the 
additional information present in library 300B and 

25 that a single library having the information of 
libraries 300A and 300B can be implemented. 

In this approach, each of the current or 
prior art visual controls 3 02 are extended to include 
appropriate recognition/ audible prompting attributes . 

30 The controls 3 02 can be formed in a library. The 
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server side plug-in module 209 accesses the library 
for markup information. Execution of the controls 
generates a client side markup page, or a portion 
thereof, with the provided parameters. 
5 In a second approach illustrated in Fig. 7, 

new visual, recognition/audible prompting controls 
3 04 are provided such that the controls 3 04 are a 
subclass relative to visual controls 302, wherein 
recognition/audible prompting functionality or markup 

10 information is provided at controls 304. In other 
words, a new set of controls 3 04 are provided for 
recognition/audible prompting and include appropriate 
parameters or attributes to perform the desired 
recognition or an audible prompting related to a 

15 recognition task on the client device 30. The 
controls 304 use the existing visual controls 3 02 to 
the extent that visual information is rendered or 
obtained through a display. For instance, a control 
11 SpeechLabel" at level 3 04 uses the "Label" control 

2 0 at level 3 02 to provide an audible rendering and/or 
visual text rendering. Likewise, a "SpeechTextbox" 
control would associate a grammar and related 
recognition resources and processing with an input 
field. Like the first approach, the attributes for 

25 controls 304 include where the grammar is located for 
recognition, the inline text for text-to-speech 
conversion, or the location of a prerecorded audio 
data file that will be rendered directly or a text 
file through text-to-speech conversion. The second 

30 approach is advantageous in that interactions of the 
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recognition controls 304 with the visual controls 302 
are through parameters or attributes, and thus, 
changes in the visual controls 302 may not require 
any changes in the recognition controls 304 provided 
5 the parameters or attributes interfacing between the 
controls 3 04 and 3 02 are still appropriate. However, 
with the creation of further visual controls 302, a 
corresponding recognition/audible prompting control 
at level 304 may also have to be written. 

10 A third approach is illustrated in Fig. 8. 

Generally, controls 306 of the third approach are 
separate from the visual controls 3 02, but are 
associated selectively therewith as discussed below. 
In this manner, the controls 306 do not directly 

15 build upon the visual controls 302, but rather 
provide recognition/audible prompting enablement 
without having to rewrite the visual controls 3 02. 
The controls 306, like the controls 302, use a 
library 300. In this embodiment, library 300 includes 

20 both visual and recognition/audible prompting markup 
information and as such is a combination of libraries 
300A and 300B of Fig. 6. 

There are significant advantages to this 
third approach. Firstly, the visual controls 302 do 

25 not need to be changed in content. Secondly, the 
controls 306 can form a single module which is 
consistent and does not need to change according to 
the nature of the speech-enabled control 302. 
Thirdly, the process of speech enablement, that is, 

30 the explicit association of the controls 306 with the 
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visual controls 302 is fully under the developer's 
control at design time, since it is an explicit and 
selective process. This also makes it possible for 
the markup language of the visual controls to receive 
5 input values from multiple sources such as through 
recognition provided by the markup language generated 
by controls 3 06, or through a conventional input 
device such as a keyboard. In short, the controls 3 06 
can be added to an existing application authoring 

10 page of a visual authoring page of the server side 
plug-in module 209. The controls 306 provide a new 
modality of interaction (i.e. recognition and/or 
audible prompting) for the user of the client device 
30, while reusing the visual controls' application 

15 logic and visual input/output capabilities. In view 
that the controls 306 can be associated with the 
visual controls 3 02 whereat the application logic can 
be coded, controls 306 may be hereinafter referred to 
as "companion controls 3 06" and the visual controls 

20 302 be referred to as "primary controls 302". It 
should be noted that these references are provided 
for purposes of distinguishing controls 302 and 306 
and are not intended to be limiting. For instance, 
the companion controls 3 06 could be used to develop 

25 or author a website that does not include visual 
renderings such as a voice-only website. In such a 
case, certain application logic could be embodied in 
the companion control logic. 

A first exemplary set of companion controls 

30 306 are further illustrated in Fig. 9. The set of 



companion controls 3 06 can be grouped as output 
controls 308 and input controls 310. Output controls 
308 provide "prompting" client side markups, which 
typically involves the playing of a prerecorded audio 
5 file, or text for text-to-speech conversion, the data 
included in the markup directly or referenced via a 
URL. Although a single output control can be defined 
with parameters to handle all audible prompting, in 
the exemplary embodiment, the forms or types of 

10 audible prompting in a human dialog are formed as 
separate controls. In particular, the output controls 
308 can include a "Question" control 308A, a 
"Confirmation" control 3 08B and a "Statement" control 
308C, which will be discussed in detail below. 

15 Likewise, the input controls 310 can also form or 
follow human dialog and include a "Answer" control 
310A and a "Command" control 310B. The input controls 
310 are discussed below, but generally the input 
controls 310 associate a grammar with expected or 

20 possible input from the user of the client device 30. 

Although the question control 3 08A, 
confirmation control 308B, statement control 308C, 
answer control 310A, command control 310B, other 
controls as well as the general structure of these 

25 controls, the parameters and event handlers, are 
specifically discussed with respect to use as 
companion controls 306, it should be understood that 
these controls, the general structure, parameters and 
event handlers can be adapted to provide recognition 

3 0 and/or audible prompting in the other two approaches 
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discussed above with respect to Figs. 6 and 7. For 
instance, the parameter "ClientToSpeechEnable" , which 
comprises one exemplary mechanism to form the 
association between a companion control and a visual 
5 control, would not be needed when embodied in the 
approaches of Figs. 6 and 7. 

In a multimodal application, at least one 
of the output controls 3 08 or one of the input 
controls 310 is associated with a primary or visual 

10 control 302. In the embodiment illustrated, the 
output controls 3 08 and input controls 310 are 
arranged or organized under a "Question/Answer" 
(hereinafter also "QA" ) control 320. QA control 320 
is executed on the web server 202, which means it is 

15 defined on the application development web page held 
on the web server using the server- side markup 
formalism (ASP, JSP or the like) , but is output as a 
different form of markup to the client device 30 or 
telephony voice browser 212. Although illustrated in 

2 0 Fig. 9 where the QA control appears to be formed of 
all of the output controls 308 and the input controls 
310, it should be understood that these are merely 
options wherein one or more may be included for a QA 
control . 

25 At this point it may be helpful to explain 

use of the controls 308 and 310 in terms of 
application scenarios. Referring to Fig. 10 and in a 
voice-only application QA control 320 could comprise 
a single question control 3 08A and an answer control 

30 310A. The question control 308A contains one or more 
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prompt objects or controls 322, while the answer 
control 310A can define a grammar through grammar 
object or control 324 for recognition of the input 
data and related processing on that input. Line 326 
5 represents the association of the QA control 320 with 
the corresponding primary control 3 02, if used. In a 
multimodal scenario, where the user of the client 
device 3 0 may touch on the visual textbox, for 
example with a "TapEvent", an audible prompt may not 

10 be necessary. For example, for a primary control 
comprising a textbox having visual text forming an 
indication of what the user of client device should 
enter in the corresponding field, a corresponding QA 
control 32 0 may or may not have a corresponding 

15 prompt such as an audio playback or a text-to-speech 
conversion, but would have a grammar corresponding to 
the expected value for recognition, and event 
handlers 32 8 to process the input, or process other 
recognizer events such as no speech detected, speech 

20 not recognized, or events fired on timeouts (as 
illustrated in "Eventing" below) . 

In general, the QA control through the 
output controls 3 08 and input controls 310 and 
additional logic can perform one or more of the 

25 following: provide output audible prompting, collect 
input data, perform confidence validation of the 
input result, allow additional types of input such as 
"help" commands, or commands that allow the user of 
the client device to navigate to other selected areas 

30 of the website, allow confirmation of input data and 
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control of dialog flow at the website, to name a few. 
In short, the QA control 320 contains all the 
controls related to a specific topic. In this manner, 
a dialog is created through use of the controls with 
5 respect to the topic in order to inform to obtain 
information, to confirm validity, or to repair a 
dialog or change the topic of conversation. 

In one method of development, the 
application developer can define the visual layout of 

10 the application using the visual controls 302. The 
application developer can then define the spoken 
interface of the application using companion controls 
3 06 (embodied as QA control 32 0, or output controls 
308 and input control 310). As illustrated in FIGS. 9 

15 and 10, each of the companion controls 3 06 are then 
linked or otherwise associated with the corresponding 
primary or visual control 302 to provide recognition 
and audible prompting. Of course if desired, the 
application developer can define or encode the 

20 application by switching between visual controls 302 
and companion controls 3 06, forming the links 
therebetween, until the application is completely 
defined or encoded. 

At this point, it may be helpful to provide 

25 a short description of each of the output controls 
308 and input controls 310. Detailed descriptions are 
provided below for this embodiment in Appendix A. 
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Questions, Answers and Commands 

Generally, as indicated above, the question 
controls 308A and answer controls 310A in a QA 
control 32 0 hold the prompt and grammar resources 
5 relevant to the primary control 3 02, and related 
binding (associating recognition results with input 
fields of the client-side markup page) and processing 
logic. The presence, or not, of question controls 
3 08A and answer controls 31 OA determines whether 
10 speech output or recognition input is enabled on 
activation. Command controls 310B and user initiative 
answers are activated by specification of the Scope 
property on the answer controls 310A and command 
controls 310B. 

15 In simple voice-only applications, a QA 

control 320 will typically hold one question control 
or object 308A and one answer control or object 310A. 
Although not shown in the example below, command 
controls 310B may also be specified, e.g. Help, 

20 Repeat, Cancel, etc., to enable user input which does 
not directly relate to the answering of a particular 
question. 

A typical 'regular' QA control for voice-only 
dialog is as follows: 

25 

<Speech :QA 

id="QA_WhichOne" 

Control sToSpeechEnable=" textBoxl" 
runat=" server" > 

30 

<Question > 

<prompt> Which one do you want? 

< /prompt > 

</Question> 
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<Answer > 

<grammar src=" whichOne .gram" /> 
< /Answer > 
< /Speech :QA> 

5 

(The examples provided herein are written in the 
ASP. Net framework by example only and should not be 
considered as limiting the present invention.) 

In this example, the QA control can be 

10 identified by its "id", while the association of the QA 
control with the desired primary or visual control is 
obtained through the parameter 

"ControlsToSpeechEnable", which identifies one or more 
primary controls by their respective identifiers. If 

15 desired, other well-known techniques can be used to 
form the association. For instance, direct, implicit 
associations are available through the first and second 
approaches described above, or separate tables can be 
created used to maintain the associations. The 

2 0 parameter "runat" instructs the web server that this 

code should be executed at the webserver 202 to 
generate the correct markup. 

A QA control might also hold only a 
statement control 308C, in which case it is a prompt- 
25 only control without active grammars (e.g. for a 
welcome prompt) . Similarly a QA control might hold only 
an answer control 310A, in which case it may be a 
multimodal control, whose answer control 310A activates 
its grammars directly as the result of an event from 

3 0 the GUI, or a scoped mechanism (discussed below) for 

user initiative. 
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It should also be noted that a QA control 
320 may also hold multiple output controls 308 and 
input controls 310 such as multiple question controls 
308A and multiple answers controls 310A. This allows an 
5 author to describe interactional flow about the same 
entity within the same QA control. This is particularly 
useful for more complex voice-only dialogs. So a mini- 
dialog which may involve different kinds of question 
and answer (e.g. asking, confirming, giving help, 

10 etc.), can be specified within the wrapper of the QA 
control associated with the visual control which 
represents the dialog entity. A complex QA control is 
illustrated in Fig. 10. 

The foregoing represent the main features of 

15 the QA control. Each feature is described from a 
functional perspective below. 

Answer Control 

The answer control 310A abstracts the notion 

of grammars, binding and other recognition processing 

20 into a single object or control. Answer controls 310A 

can be used to specify a set of possible grammars 

relevant to a question, along with binding declarations 

and relevant scripts. Answer controls for multimodal 

applications such as "Tap-and-Talk" are activated and 

25 deactivated by GUI browser events. The following 

example illustrates an answer control 310A used in a 

multimodal application to select a departure city on 

the "mouseDown" event of the textbox "txtDepCity" , and 

write its value into the primary textbox control: 



30 
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<Speech:QA 

control sToSpeechEnable=" txtDepCity" 
runat=" server" > 

<Answer id="AnsDepCity" 
5 StartEvent="onMouseDown" 

S t opE ve n t = " onMou s eUp " 
/> 

<grammar src=" /grammars/depCit ies . gram" /> 
<bind value=" //sml/DepCity" 
10 targetElement="txtCity" /> 

</Answer> 
</Speech:QA> 

Typical answer controls 310A in voice-only 
15 applications are activated directly by question 
controls 308A as described below. 

The answer control further includes a 
mechanism to associate a received result with the 
primary controls. Herein, binding places the values in 
2 0 the primary controls; however, in another embodiment 
the association mechanism may allow the primary control 
to look at or otherwise access the recognized results. 

Question Control 

Question controls 308A abstracts the notion 

2 5 of the prompt tags into an object which contains a 

selection of possible prompts and the answer controls 
310A which are considered responses to the question. 
Each question control 308A is able to specify which 
answer control 310A it activates on its execution. This 

3 0 permits appropriate response grammars to be bundled 

into answer controls 310A, which reflect relevant 
question controls 308A. 
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The following question control 3 08A might be 
used in a voice-only application to ask for a Departure 
City: 

5 <Speech:QA id="QADepCity" 

control sToSpeechEnable=" txtDepCity" 
runat = " server " > 
<Question id="Ql" Answers="AnsDepCity" > 
<prompt> 

10 Please give me the departure 

city. 
< /prompt > 
</Question> 

15 <Answer id="AnsDepCity" ... /> 

< /Speech :QA> 

In the example below, different prompts can 
be called depending on an internal condition of the 
20 question control 308A. The ability to specify 
conditional tests on the prompts inside a question 
control 3 08A means that changes in wording can be 
accommodated within the same functional unit of the 
question control 308A. 

25 

< Speech : QA id="QADepCity" 

control sToSpeechEnable=" txtDepCity" 
runat = " server" > 
<Question id= / 'Ql" Answers = "AnsDepCity" > 
3 0 <prompt count="l"> 

Now I need to get the departure city. 
Where would you like to fly from? 
< /prompt > 

<prompt count ="2" > 
3 5 Which departure city? 

< /prompt > 
</Question> 
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<Answer id="AnsDepCity" ... /> 
</Speech:QA> 
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Conditional QA Control 

The following example illustrates how to 
determine whether or not to activate a QA control based 
5 upon information known to the application. The example 
is a portion of a survey application. The survey is 
gathering information from employees regarding the mode 
of transportation they use to get to work. 

The portion of the survey first asks whether 
10 or not the user rides the bus to work. If the answer 
is: 



Yes, the next question asks how many days last 
week the users rode the bus . 
15 - No, the "number of days rode the bus" question 

is bypassed. 

<asp : Label id= " lblDisplayl " 

text =" Do you ride the bus to work?" 

2 0 runa t = " server " / > 

<asp :DropDownList id="lstRodeBusYN" runat= "server" > 

<asp : Listltem 
selected=" true" >No</asp :ListItem> 
25 <asp : ListItem>Yes</asp : Listltem> 

</asp :DropDownList> 

<Speech:QA id="QA_RideBus 

ControlsToSpeechEnable="lstRodeBusYN" 

3 0 runat=" server" > 



<SDN: Question id= "Q_RideBus " > 

<prompt bargeln= " False " > 
35 Do you ride the bus to work? 

< /prompt > 



</SDN : Quest ion> 



42 



<SDN: Answer id="A_RideBus " 

autobind= " Fal se " 
S t ar t Event = " onMous eDown " 
5 StopEvent = ,, onIyIouseUp ,l 

runat=" server" 

onClientReco="ProcessRideBusAnswer" 

<grammar src=". . /> <--! "yes/no" 
10 grammar --> 

< /SDN: Answer > 
</Speech:QA> 

15 <asp: Label id= " lblDisplay2 " 
enabled^" False" 

text="How many days last week did you ride 
the bus to work?" 

runat=" server"/ > 

20 

<asp :DropDownList id=" IstDaysRodeBus" enabled^" False" 
runat=" server" > 

<asp:ListItem selected="true" 
>0</asp : Listltem> 
25 <asp : Listltem>l</asp : Listltem> 

<asp : Listltem>2</asp :ListItem> 
<asp : Listltem>3</asp : Listltem> 
<asp : Listltem>4</asp :ListItem> 
<asp : Listltem>5</asp : Listltem> 
30 <asp : List Item>6</asp : Listltem> 

<asp : Listltem>7</asp : Listltem> 
</asp :DropDownList> 

<Speech:QA id="QA_DaysRodeBus" 
35 ControlsToSpeechEnable= "IstDaysRodeBus " 

ClientTest="RideBusCheck" 
runat = " server " > 
<Question id="Q_DaysRodeBus" > 

4 0 <prompt bargeln= " False" > 

How many days last week did you ride the 
bus to work? 

< /prompt > 
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</SDN: Quest ion> 
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<SDN: Answer id="A__DaysRodeBus" 
autobind= " False " 
Start Event = " onMou s eDown " 
5 S t opEvent = " onMouseUp 11 

runat=" server" 

onClientReco= M ProcessDaysRodeBusAnswer " 

<grammar src=" . . ." /> <--! "numbers" 
10 grammar --> 

< /SDN: Answer > 



15 </Speech:QA> 



<script language= " j script " > 

function ProcessRideBusAnswer ( ) { 
20 <--! using SML attribute of the Event object, 

determine yes or no answer --> 

<--! then select the appropriate item in the 
dropdown listbox --> 

<--! and enable the next label and dropdown 

2 5 listbox if answer is "yes" --> 

if <--! Answer is "yes" --> { 
IstRodeBusYN. selectedlndex=2 
lblDisplay2 . enabled=" true" 

3 0 IstDaysRodeBus . enabled=" true" } 

} 

function RideBusCheck ( ) { 

if IstRodeBusYN. selectedlndex=" 1 " <--! 
35 this is no --> 

then return "False" 
endif 

} 

40 function ProcessDaysRodeBusAnswer ( ) { 

<--! case statement to select proper 
dropdown item - - > 



} 

45 </script> 
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In the example provided above, the QA 
control " QA_Day sRodeBus 11 is executed based on a boolean 
parameter "ClientTest" , which in this example, is set 
5 based on the function RideBusCheck ( ) . If the function 
returns a false condition, the QA control is not 
activated, whereas if a true condition is returned the 
QA control is activated. The use of an activation 
mechanism allows increased flexibility and improved 
10 dialog flow in the client side markup page produced. As 
indicated in Appendix A many of the controls and 
objects include an activation mechanism. 

Command Control 

Command controls 310B are user utterances 

15 common in voice -only dialogs which typically have 
little semantic import in terms of the question asked, 
but rather seek assistance or effect navigation, e.g. 
help, cancel, repeat, etc. The Command control 310B 
within a QA control 306 can be used to specify not only 

2 0 the grammar and associated processing on recognition 
(rather like an answer control 310A without binding of 
the result to an input field), but also a 'scope 1 of 
context and a type. This allows for the authoring of 
both global and context-sensitive behavior on the 

25 client side markup. 

As appreciated by those skilled in the art 
from the foregoing description, controls 306 can be 
organized in a tree structure similar to that used in 
visual controls 302. Since each of the controls 306 are 

30 also associated with selected visual controls 302, the 
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organization of the controls 306 can be related to the 
structure of the controls 302. 

The QA controls 302 may be used to speech- 
enable both atomic controls (textbox, label , etc.) and 
5 container controls (form, panel, etc.) This provides a 
way of scoping behaviour and of obtaining modularity of 
subdialog controls. For example, the scope will allow 
the user of the client device to navigate to other 
portions of the client side markup page without 
10 completing a dialog. 

In one embodiment, "Scope" is determined as 
a node of the primary controls tree. The following is 
an example "help" command, scoped at the level of the 
"Pnll" container control, which contains two textboxes. 

15 

<asp: panel id="Pnll" ...> 

<asp:textbox id= // tbl" ... /> 

<asp:textbox id="tb2" ... /> 
</asp :panel> 

20 

<Speech:QA ... > 
< Command 

id= /, HelpCmdl // 
scope=" Pnll" 
25 type="help" 

onClientReco="GlobalGiveHelp ( ) " > 

<Grammar src = "grammars/help . gram 7 ' /> 
</ Command > 
30 </Speech:QA> 

<script> 

function GlobalGiveHelp ( ) { 

35 } 

</script> 
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As specified, the "help" grammar will be 
active in every QA control relating to "Pnll" and its 
contents. The GlobalGiveHelp subroutine will execute 
every time "help" is recognized. To override this and 
5 achieve context-sensitive behavior, the same typed 
command can be scoped to the required level of context: 

<Speech:QA ... > 
< Command 
10 id=' , HelpCmd2 , ' 
scope="Tb2" 
type="help" 

onClientReco="SpecialGiveHelp ( ) " > 

15 <Grammar src=" grammars/help .gram" /> 

</ Command > 
</Speech:QA> 

<script> 

2 0 function SpecialGiveHelp ( ) { 

} 

</script> 

Confirmation Control 

2 5 The QA control 320 can also include a method 

for simplifying the authoring of common confirmation 
subdialogs. The following QA control exemplifies a 
typical subdialog which asks and then confirms a value: 

3 0 < Speech :QA 

id="qaDepCity" 

controlsToSpeechEnable="txtDepCity" 
runat=" server" > 

35 < ! — asking for a value --> 

<Question id="AskDepCity" 
type="ask" 

Answers="AnsDepCity" > 
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<prompt> Which city? < /prompt > 
</Question> 

<Answer id="AnsDepCity" 

conf irmThreshold="60" > 
5 <grammar src="grammars/depCity .gram" 

/> 

< /Answer > 

<! — confirming the value --> 
10 <Confirm id="Conf irmDepCity" 

Answers = " AnsConf DepCi ty " > 
<prompt> 

Did you say <value 
targetElement = " txtDepCity/Text " >? 
15 < /prompt > 

</Conf irm> 

<Answer id="AnsConf DepCity" > 
<grammar 

src="grammars/YesNoDepCity . gram" / > 

2 0 < /Answer > 

</Speech:QA> 

In this example, a user response to 1 which 
25 city?' which matches the AnsDepCity grammar but whose 
confidence level does not exceed the conf irmThreshold 
value will trigger the confirm control 308. More 
flexible methods of confirmation available to the 
author include mechanisms using multiple question 

3 0 controls and multiple answer controls. 

In a further embodiment, additional input 
controls related to the confirmation control include 
an accept control, a deny control and a correct 
control. Each of these controls could be activated 
35 (in a manner similar to the other controls) by the 
corresponding confirmation control and include 
grammars to accept, deny or correct results, 
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respectively. For instance, users are likely to deny 
be saying "no" , to accept by saying "yes" or "yes + 
current value" (e.g., "Do you want to go to Seattle?" 
"Yes, to Seattle"), to correct by saying "no" + new 
5 value (e.g., "Do you want to go to Seattle" "No, 
Pittsburgh") . 

Statement Control 

The statement control allows the application 
developer to provide an output upon execution of the 

10 client side markup when a response is not required from 
the user of the client device 30. An example could be a 
"Welcome" prompt played at the beginning of execution 
of a client side markup page. 

An attribute can be provided in the 

15 statement control to distinguish different types of 
information to be provided to the user of the client 
device. For instance, attributes can be provided to 
denote a warning message or a help message. These types 
could have different built-in properties such as 

20 different voices. If desired, different forms of 
statement controls can be provided, i.e. a help 
control, warning control, etc. Whether provided as 
separate controls or attributes of the statement 
control, the different types of statements have 

25 different roles in the dialog created, but share the 
fundamental role of providing information to the user 
of the client device without expecting an answer back. 
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Eventing 

Event handlers as indicated in FIG. 10 are 
provided in the QA control 32 0, the output controls 308 
and the input controls 310 for actions/inactions of the 
5 user of the client device 30 and for operation of the 
recognition server 204 to name a few, other events are 
specified in Appendix A. For instance, mumbling, where 
the speech recognizer detects that the user has spoken 
but is unable to recognize the words and silence, where 

10 speech is not detected at all, are specified in the QA 
control 320. These events reference client-side script 
functions defined by the author. In a multimodal 
application specified earlier, a simple mumble handler 
that puts an error message in the textbox could be 

15 written as follows: 

<Speech:QA 

control sToSpeechEnable=" txtDepCit 
y" onClientNoReco="OnMumble () " 

2 0 runat=" server" > 

<Answer id= // AnsDepCity // 

Start Event = " onMouseDown" 
S t opE vent = " onMous eUp " 
> 

25 <grammar src=" /grammars/depCities . gram" /> 

<bind value="//sml/DepCity" 
targetElement="txtCity" /> 

</Answer> 
</Speech:QA> 

30 

<script> 

function OnMumbleO { 

txtDepCity . value= n . . .recognition 

error. . . " ; 
35 } 

</script> 
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Control Execution Algorithm 

In one embodiment, a client- side script or 
module (herein referred to as "RunSpeech" ) is 
provided to the client device. The purpose of this 
5 script is to execute dialog flow via logic, which is 
specified in the script when executed on the client 
device 30, i.e. when the markup pertaining to the 
controls is activated for execution on the client due 
to values contained therein. The script allows 

10 multiple dialog turns between page requests, and 
therefore, is particularly helpful for control of 
voice-only dialogs such as through telephony browser 
216. The client-side script RunSpeech is executed in 
a loop manner on the client device 30 until a 

15 completed form in submitted, or a new page is 
otherwise requested from the client device 30. 

It should be noted that in one embodiment, 
the controls can activate each other (e.g. question 
control activating a selected answer control) due to 

20 values when executed on the client. However, in a 
further embodiment, the controls can "activate" each 
other in order to generate appropriate markup, in 
which case server- side processing may be implemented. 

Generally, in one embodiment, the algorithm 

2 5 generates a dialog turn by outputting speech and 
recognizing user input. The overall logic of the 
algorithm is as follows for a voice-only scenario: 
1. Find next active output companion control; 
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2. If it is a statement, play the statement and go 
back to 1; If it is a question or a confirm go 
to 3; 

3. Collect expected answers; 
5 4. Collect commands; 

5. Play output control and listen in for input; 

6. Activate recognized Answer or Command object or, 
issue an event if none is recognized; 

7 . Go back to 1 . 
10 In the multimodal case, the logic is simplified to 
the following algorithm: 

1. Wait for triggering event - i.e., user tapping 
on a control; 

2. Collect expected answers; 
15 3. Listen in for input; 

4. Activate recognized Answer object or, if none, 
throw event ; 

5 . Go back to 1 . 

The algorithm is relatively simple because, 
20 as noted above, controls contain built-in information 
about when they can be activated. The algorithm also 
makes use of the role of the controls in the 
dialogue. For example statements are played 
immediately, while questions and confirmations are 
25 only played once the expected answers have been 
collected. 

In a further embodiment, implicit 
confirmation can be provided whereby the system 
confirms a piece of information and asks a question 
30 at the same time. For example the system could 
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confirm the arrival city of a flight and ask for the 
travel date in one utterance: "When do you want to go 
to Seattle?" (i.e. asking 'when' and implicitly 
confirming 'destination: Seattle'). If the user gives 
5 a date then the city is considered implicitly 
accepted since, if the city was wrong, users would 
have immediately challenged it. In this scenario, it 
becomes clear that the knowledge of what a user is 
trying to achieve is vitally important: are they 

10 answering the question, or are they correcting the 
value, or are they asking for help? By using the role 
of the user input in the dialogue the system can know 
when to implicitly accept a value. 

In summary, a dialog is created due to the 

15 role of the control in the dialog and the 
relationship with other controls, wherein the 
algorithm executes the controls and thus manages the 
dialog. Each control contains information based on 
its type which is used by the execution algorithm to 

2 0 select (i.e. make active for execution) a given 
control according to whether or not it serves a 
useful purpose at that point in the dialog on the 
client. For example, confirmation controls are only 
active when there is a value to confirm and the 

25 system does not have sufficient confidence in that 
value to proceed. In a further implementation, most 
of these built-in pieces of information can be 
overridden or otherwise adapted by application 
developers . 
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The following table summarizes the 
controls, their corresponding role in the dialog and 
the relationship with other controls. 



Control 


Role in dialogue 


Relationship with other 
controls 


Statement 


output : present 
information to 
users 


(none) 


Question 


output : ask 
question 


selects expected Answers 
as a response 


Confirmation 


output: confirm a 
value obtained from 
the user 


selects potential input 
controls as a response, 
typically Accept, Deny, 
Correct 


Answer 


input : provide an 
answer to a 
question 


selected by 
Question/Confirmation 


Command 


input : seek to 
repair a dialog, or 
change the topic of 
conversation 


scoped to other controls 


Accept 


input: confirm a 
value in response 
to a confirmation 


selected by a 
confirmation 


Deny 


input : deny a value 
in response to a 
confirmation 


selected by a 
confirmation 


Correct 


input : correct a 
value in response 
to a confirmation 


selected by a 
confirmation 


QA 




(wrapper: contains all 
the controls related to 
a specific topic) 



5 



The use of these controls may be explained with an 
illustration of a simple human/computer dialog. In 
the dialog below, each dialog turn on the part of the 
System or the User is characterized according to the 
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control (indicated in parentheses) which reflects its 
purpose in the dialog. 

1. System (Statement): "Welcome to the travel booking 
service' 7 . 

5 2. System (Question) : "Where would you like to go?" 

3. User (Answer): "San Francisco." 

4. System (Confirmation): "Did you say Seattle?" 

5. User (Deny) : "No." 

6. System (Question) : "Where would you like to go?" 
10 7. User (Answer): "San Francisco." 

8. System (Confirmation) : "Did you say Seattle?" 

9. User (Correct): "I said San Francisco." 

10. System (Confirmation): "Did you say San 
Francisco?" 

15 11. User (Correct): "Yes." 

12. System (Question): "When would you like to 
leave?" 

13. User (Command): "Help." 

Turn 1 is a statement on the part of the 
20 System. Since a statement control activates no answer 
controls in response, the system does not expect 
input. The system goes on to activate a question 
control at turn 2. This in turn activates a set of 
possible answer controls, including one which holds a 
25 grammar containing the cities available through the 
service, including "San Francisco", "Seattle", etc., 
which permits the user to provide such a city in turn 
3. 

The user's turn 3 is misrecognized by the 
30 system. Although the system believes it has a value 
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from an answer control for the city, its confidence 
in that value is low (rightly so, since it has 
recognized incorrectly) . This low confidence value in 
a just-received answer control is sufficient 
5 information for RunSpeech to trigger a confirmation 
control on the part of the system, as generated at 
turn 4. The confirmation control in turn activates a 
deny control, a correct control and an accept control 
and makes their respective grammars available to 

10 recognize the user's next turn. User turns 5, 9 and 
11 illustrate example responses for these controls. 
Turn 5 of the user simply denies the value "no" . This 
has the effect of removing the value from the system, 
so the next action of RunSpeech is to ask the 

15 question again to re-obtain the value (turn 6) . 

Turns 7 and 8 return us to a confirmation 
control as with 3 and 4 . 

User turn 9 is a correct control, which has 
again been activated as a possible response to the 

20 confirmation control. A correct control not only 
denies the value undergoing confirmation, it also 
provides a new value. So user turn 9 is recognized by 
the system as a correct control with a new value 
which, correctly this time, is recognized as "San 

25 Francisco" . 

The system's confidence in the new value is 
low, however, and yet another confirmation control is 
generated at turn 10. This in turn activates accept, 
deny and correct controls in response, and user turn 

30 11 ( u Yes" matches an accept control grammar. The 



recognition of the accept control has the effect of 
'grounding' the system's belief in the value which it 
is trying to obtain, and so RunSpeech is now able to 
select other empty values to obtain. In turn 12, a 
5 new question control is output which asks for a date 
value. The user's response this time (turn 13) is a 
command: "help". Command controls are typically 
activated in global fashion, that is, independently 
of the different question controls and confirmation 

10 controls on the part of the system. In this way the 
user is able to ask for help at any time, as he does 
in turn 13. Command controls may also be more 
sensitively enabled by a mechanism that scopes their 
activation according to which part of the primary 

15 control structure is being talked about. 

Referring back to the algorithm, in one 
exemplary embodiment, the client -side script 
RunSpeech examines the values inside each of the 
primary controls and an attribute of the QA control, 

20 and any selection test of the QA controls on the 
current page, and selects a single QA control for 
execution. For example, within the selected QA 
control, a single question and its corresponding 
prompt are selected for output, and then a grammar is 

25 activated related to typical answers to the 
corresponding question. Additional grammars may also 
be activated, in parallel, allowing other commands 
(or other answers) , which are indicated as being 
allowable. Assuming recognition has been made and any 

3 0 further processing on the input data is complete, the 
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client-side script RunSpeech will begin again to 
ascertain which QA control should be executed next. 
An exemplary implementation and algorithm of 
RunSpeech is provided in Appendix A. 
5 It should be noted that the use of the controls 

and the RunSpeech algorithm or module is not limited 
to the client/server application described above, but 
rather can be adapted for use with other application 
abstractions. For instance, an application such as 

10 VoiceXML, which runs only on the client device 3 0 or 
telephony voice browser 212, could conceivably 
include further elements or controls such as question 
and answer provided above as part of the VoiceXML 
browser and operating in the same manner. In this 

15 case the mechanisms of the RunSpeech algorithm 
described above could be executed by default by the 
browser without the necessity for extra script. 
Similarly, other platforms such as finite state 
machines can be adapted to include the controls and 

20 RunSpeech algorithm or module herein described. 

Synchroni zat ion 

As noted above, the companion controls 3 06 
are associated with the primary controls 302 (the 

2 5 existing controls on the page) . As such the companion 
controls 306 can re-use the business logic and 
presentation capabilities of the primary controls 302. 
This is done in two ways: storing values in the primary 
controls 302 and notifying the primary controls of the 

30 changes 3 02. 
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The companion controls 3 06 synchronize or 
associates their values with the primary controls 302 
via the mechanism called binding. Binding puts values 
retrieved from recognizer into the primary controls 
5 302, for example putting text into a textbox, herein 
exemplified with the answer control. Since primary 
controls 302 are responsible for visual presentation, 
this provides visual feedback to the users in 
multimodal scenarios . 

10 The companion controls 306 also offer a 

mechanism to notify the primary controls 302 that 
they have received an input via the recognizer. This 
allows the primary controls 302 to take actions, such 
as invoking the business logic. (Since the 

15 notification amounts to a commitment of the companion 
controls 306 to the values which they write into the 
primary controls 302, the implementation provides a 
mechanism to control this notification with a fine 
degree of control. This control is provided by the 

2 0 RejectThreshold and ConfirmThreshold properties on 
the answer control, which specify numerical acoustic 
confidence values below which the system should 
respectively reject or attempt to confirm a value.) 

A second exemplary set of companion 

25 controls 400 is illustrated in FIG. 11. In this 
embodiment, the companion controls 4 00 generally 
include a QA control 4 02, a Command control 4 04, a 
CompareValidator control 4 06, a Custom Validator 
control 408 and a semantic map 410. The semantic map 

30 410 is schematically illustrated and includes 



? 
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SemanticItemSemanticItems 412 that form a layer 
between the visual domain primary controls 4 02 (e.g. 
HTML and a non-visual recognition domain of the 
companion controls 400. 
5 At this point, it should be emphasized that 

that although the organization of the companion 
controls QA and Command is different than that of the 
first set of companion controls discussed above, the 
functionality remains the same. In particular, the QA 

10 control 4 02 includes a Prompt property that 
references Prompt objects to perform the functions of 
output controls, i.e. that provide "prompting" client 
side markups for human dialog, which typically 
involves the playing of a prerecorded audio file, or 

15 text for text-to-speech conversion, the data included 
in the markup directly or referenced via a URL. 
Likewise, the input controls are embodied as the QA 
control 4 02 and Command Control 4 04 and also follow 
human dialog and include the Prompt property 

20 (referencing a Prompt object) and an Answer property 
that references at least one Answer object. Both the 
QA control 4 02 and the Command control 4 04 associate 
a grammar with expected or possible input from the 
user of the client device 30. The QA control 402 in 

25 this embodiment can thus be considered a question 
control, an answer control as well as a confirm 
control and a statement control since it includes 
properties necessary for performing these functions. 

Although the QA control 402, Command 

30 control 404, Compare Validator control 406 and Custom 
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Validator control 408 and other controls as well as 
the general structure of these controls, the 
parameters and event handlers, are specifically 
discussed with respect to use as companion controls 
5 4 00, it should be understood that these controls, the 
general structure, parameters and event handlers can 
be adapted to provide recognition and/or audible 
prompting in the other two approaches discussed above 
with respect to Figs. 6 and 7. For instance, the 
10 Semantic Map 410, which comprises another exemplary 
mechanism to form the association between the 
companion controls and visual control 302, would not 
be needed when embodied in the approaches of Figs. 6 
and 7. 

15 At this point, it may be helpful to provide 

a short description of each of the controls. Detailed 
descriptions are provided below in Appendix B. 

QA Control 

2 0 In general, the QA control 4 02 through the 

properties illustrated can perform one or more of the 
following: provide output audible prompting, collect 
input data, perform confidence validation of the 
input result, allow confirmation of input data and 
25 aid in control of dialog flow at the website, to name 
a few. In other words, the QA control 402 contains 
properties that function as controls for a specific 
topic . 

The QA control 402, like the other 

3 0 controls, is executed on the web server 2 02, which 
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means it is defined on the application development 
web page held on the web server using the server- side 
markup formalism (ASP, JSP or the like) , but is 
output as a different form of markup to the client 
5 device 30. Although illustrated in Fig. 11 where the 
QA control appears to be formed of all of the 
properties Prompt, Reco, Answers, ExtraAnswers and 
Confirms, it should be understood that these are 
merely options wherein one or more may be included 

10 for a QA control. 

At this point it may be helpful to explain 
use of the QA controls 402 in terms of application 
scenarios. Referring to Fig. 11 and in a voice-only 
application QA control 402 could function as a 

15 question and an answer in a dialog. The question 
would be provided by a Prompt object, while a grammar 
is defined through grammar object for recognition of 
the input data and related processing on that input. 
An Answers property associates the recognized result 

20 with a Semanticltem 412 in the Semantic Map 410 using 
an Answer object, which contains information on how 
to process recognition results. Line 414 represents 
the association of the QA control 402 with the 
Semantic Map 410, and to a Semanticltem 412 therein. 

25 Many Semanticltems 412 are individually associated 
with a visual or primary control 3 02 as represented 
by line 418, although one or more Semanticltems 412 
may not be associated with a visual control and used 
only internally. In a multimodal scenario, where the 

30 user of the client device 30 may touch on the visual 
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textbox, for example with a "TapEvent" , an audible 
prompt may not be necessary. For example, for a 
primary control comprising a textbox having visual 
text forming an indication of what the user of client 
5 device should enter in the corresponding field, a 
corresponding QA control 4 02 may or may not have a 
corresponding prompt such as an audio playback or a 
text -to- speech conversion, but would have a grammar 
corresponding to the expected value for recognition, 

10 and event handlers to process the input, or process 
other recognizer events such as no speech detected, 
speech not recognized, or events fired on timeouts. 

In a further embodiment, the recognition 
result includes a confidence level measure indicating 

15 the level of confidence that the recognized result 
was correct. A confirmation threshold can also be 
specified in the Answer object, for example, as 
Conf irmThreshold equals 0.7. If the confirmation 
level exceeds the associated threshold, the result 

2 0 can be considered confirmed. 

It should also be noted that in addition, 
or in the alternative, to specifying a grammar for 
speech recognition, QA controls and/or Command 
controls can specify Dtmf (dual tone modulated 

25 frequency) grammars to recognize telephone key 
activations in response to prompts or questions. 
Appendix B provides details of a Dtmf object that 
applies a different modality of grammar (a keypad 
input grammar rather than, for example, a speech 

30 input grammar) to the same question. Some of the 
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properties of the Dtmf object include Preflush, which 
is a flag indicating if "type -ahead" functionality is 
allowed in order that the user can provide answers to 
questions before they are asked. Other properties 
5 include the number of milliseconds to wait for 
receiving the first key press, InitialTimeOut , and 
the number of milliseconds to wait before adjacent 
key presses, InterdigitTimeOut . Client -side script 
functions can be specified for execution through 

10 other properties, for example, when no key press is 
received, OnClientSilence , or when the input is not 
recognized, OnClientNoReco, or when an error is 
detected OnClientError . 

At this point it should be noted that when 

15 a Semanticitem 412 of the Semantic map 410 is filled, 
through recognition for example, speech or Dtmf, 
several actions can be taken. First, an event can be 
issued or fired indicating that the value has been 
"changed" . Depending on if the confirmation level was 

20 met, another event that can be issued or fired 
includes a "confirm" event that indicates that the 
corresponding SemanticItemSemanticItem has been 
confirmed. These events are used for controlling 
dialog . 

25 The Confirms property can also include 

answer objects having the structure similar to that 
described above with respect to the Answers property 
in that it is associated with a Semanticitem 412 and 
can include a Conf irmThreshold if desired. The 

30 Confirms property is not intended to obtain a 
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recognition result per se, but rather, to confirm a 
result already obtained and ascertain from the user 
whether the result obtained is correct. The Confirms 
property is a collection of Answer objects used to 
5 assert whether the value of a previously obtained 
result was correct. The containing QA's Prompt object 
will inquire about these items, and obtains the 
recognition result from the associated Semanticltem 
412 and forms it in a question such as "Did you say 

10 Seattle?" If the user responds with affirmation such 
as "Yes", the confirmed event is then fired. If the 
user responds in the negative such as "No", the 
associated Semanticltem 412 is cleared. 

It should be noted in a further embodiment, 

15 the Confirms property can also accept corrections 
after a confirmation prompt has been provided to the 
user. For instance, in response to a confirmation 
prompt "Did you say Seattle?" the user may respond 
"San Francisco" or "No, San Francisco", in which 

2 0 case, the QA control has received a correction. 
Having information as to which Semanticltem is being 
confirmed through the Answer object, the value in the 
Semanticltem can be replaced with the corrected 
value. It should also be noted that if desired, 

25 confirmation can be included in a further prompt for 
information such as "When did you want to go to 
Seattle?", where the prompt by the system includes a 
confirmation for "Seattle" and a further prompt for 
the day of departure. A response by the user 

30 providing a correction to the place of destination 
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would activate the Confirms property to correct the 
associated Semanticltem, while a response with only a 
day of departure would provide implicit confirmation 
of the destination. 
5 The ExtraAnswers property allows the 

application author to specify Answer objects that a 
user may provide in addition to a prompt or query 
that has been made. For instance, if a travel 
oriented system prompts a user for a destination 

10 city, but the user responds by indicating "'Seattle 
tomorrow", the Answers property that initially 
prompted the user will retrieve and therefore bind 
the destination city "Seattle" to the appropriate 
Semanticltem, while the ExtraAnswers property can 

15 process "Tomorrow" as the next succeeding day 
(assuming that the system knows the current day) , and 
thereby, bind this result to the appropriate 
Semanticltem in the Semantic Map. The ExtraAnswers 
property includes one or more Answer objects defined 

20 for possible extra information the user may also 
state. In the example provided above, having also 
retrieved information as to the day of departure, the 
system would then not need to reprompt the user for 
this information, assuming that the confirmation 

25 level exceeded the corresponding Conf irmThreshold . If 
the confirmation level did not exceed the 
corresponding threshold, the appropriate Confirms 
property would be activated. 

3 0 Command Control 
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Command controls 4 04 are user utterances 
common in voice-only dialogs which typically have 
little semantic import in terms of the question 
asked, but rather seek assistance or effect 
5 navigation, e.g. help, cancel, repeat, etc. The 
Command control 4 04 can include a Prompt property to 
specify a prompt object. In addition, the Command 
control 4 04 can be used to specify not only the 
grammar (through a Grammar property) and associated 

10 processing on recognition (rather like an Answer 
object without binding of the result to an 
Semanticltem) , but also a 1 scope 1 of context and a 
type. This allows for the authoring of both global 
and context-sensitive behavior on the client side 

15 markup. The Command control 404 allows additional 
types of input such as "help" commands, or commands 
that allow the user of the client device to navigate 
to other selected areas of the website. 

2 0 CompareValidator Control 

The CompareValidator control compares two 
values according to an operator and takes an 
appropriate action. The values to be compared can be 
of any form such as integers, strings of text, etc. 

25 The CompareValidator includes a property 
SematicItemtoValidate that indicates the Semanticltem 
that will be validated. The Semanticltem to be 
validated can be compared to a constant or another 
Semanticltem, where the constant or other 

30 Semanticltem is provided by properties ValuetoCompare 



and SematicItemtoCompare, respectively. Other 
parameters or properties associated with the 
CompareValidator include Operator, which defines the 
comparison to be made and Type, which defines the 
5 type of value, for example, integer or string of the 
Semanticltems . 

If the validation associated with the 
CompareValidator control fails, a Prompt property can 
specify a Prompt object that can be played 

10 instructing the user that the result obtained was 
incorrect. If upon comparison the validation fails, 
the associated Semanticltem defined by 

SematicItemtoValidate is indicated as being empty, in 
order that the system will reprompt the user for a 

15 correct value. However, it may be helpful to not 
clear the incorrect value of the associated 
Semanticltem in the Semantic Map in the event that 
the incorrect value will be used in a prompt to the 
user reiterating the incorrect value. The 

20 CompareValidator control can be triggered either when 
the value of the associated Semanticltem changes 
value or when the value has been confirmed, depending 
on the desires of the application author. 

2 5 CustomValidator Control 

The CustomValidator control is similar to 
the CompareValidator control. A property 
SematicItemtoValidate indicates the Semanticltem that 
will be validated, while a property 

30 ClientValidationFunction specifies a custom 
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validation routine through an associated function or 
script. The function would provide a Boolean value 
"yes" or "no" or an equivalent thereof whether or not 
the validation failed. A Prompt property can specify 
5 a Prompt object to provide indications of errors or 
failure of the validation. The CustomValidator 
control can be triggered either when the value of the 
associated Semanticltem changes value or when the 
value has been confirmed, depending on the desires of 

10 the application author. 
Call Control 

In a further embodiment, controls are 
provided that enable application authors to create 
speech applications that handle telephony 

15 transactions. In general, the controls implement or 
invoke well-known telephony transactions such as ECMA 
(European Computer Manufactures Associated) CSTA 
(Computer Supported Telecommunication Application) 
messages, eventing and services. As is known, CSTA 

20 specifies application interfaces and protocols for 
monitoring and controlling calls and devices in a 
communication network. These calls and devices may 
support various media and can reside in various 
network environments such as IP, Switched Circuit 

25 Networks and mobile networks. 

In the illustrated embodiment, the controls 
available to the application author include a 
SmexMessage control (SMEX- Simple Message Exchange) , a 
TransferCall control, a MakeCall control, a 

30 DisconnectCall control and an AnswerCall control. 
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Like the controls described above, these controls can 
be executed on the server so as to generate client - 
side markup that when executed on the client device 
perform the desired telephony transaction. 
5 Referring to FIG. 4, the client -side markup 

generated by server 2 02 can be executed by voice 
browser 216, which in turn provides telephony 
transactions instructions (e.g. CSTA service calls) 
to the media server 214 and gateway 210 as necessary 

10 to perform the desired telephony transaction. 
Appendix B provides detailed information regarding 
each of the properties available in the controls. The 
controls are commonly used in a voice -only mode such 
as by voice browser 216 in FIG. 4; however, it should 

15 be understood that applications can be written also 
to be executed in an multi-modal client device. 

Control Execution Algorithm 

As in the previous set of controls, a 

20 client-side script or module (herein referred to as 
"RunSpeech" ) is provided to the client device for the 
controls of FIG. 11. Again, the purpose of this 
script is to execute dialog flow via logic, which is 
specified in the script when executed on the client 

25 device 30, i.e. when the markup pertaining to the 
controls is activated for execution on the client due 
to values contained therein. The script allows 
multiple dialog turns between page requests, and 
therefore, is particularly helpful for control of 

30 voice-only dialogs such as through telephony browser 
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216. The client-side script RunSpeech is executed in 
a loop manner on the client device 30 until a 
completed form is submitted, or a new page is 
otherwise requested from the client device 30. 

Generally, in one embodiment, the algorithm 
generates a dialog turn by outputting speech and 
recognizing user input. The overall logic of the 
algorithm is as follows for a voice-only scenario 
(reference is made to Appendix B for properties or 
parameters not otherwise discussed above) : 

1. Find the first active (as defined below) QA, 
CompareValidator or CustomValidator control in 
speech index order. 

2. If there is no active control, submit the page. 

3. Otherwise, run the control. 

A QA is considered active if and only if: 

1. The QA's clientActivationFunction either is not 
present or returns true, AND 

2. If the Answers property collection is non empty, 
the State of all of the Semanticltems pointed to 
by the set of Answers is Empty OR 

3. If the Answers property collection is empty, the 
State at least one Semanticltem in the Confirm 
array is NeedsConf irmation. 

However, if the QA has PlayOnce true and its Prompt 
has been run successfully (reached OnComplete) the QA 
will not be a candidate for activation. 
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A QA is run as follows: 

1. If this is a different control than the previous 
active control, reset the prompt Count value. 

5 2 . Increment the Prompt count value 

3. If PromptSelectFunction is specified, call the 
function and set the Prompt's inlinePrompt to 
the returned string. 

4. If a Reco object is present, start it. This Reco 
10 should already include any active command 

grammar . 

A Validator (either a CompareValidator or a 
CustomValidator) is active if: 
15 1. The SemanticItemToValidate has not been 

validated by this validator and its value has 

changed . 

A CompareValidator is run as follows: 
20 1. Compare the values of the SemanticItemToCompare 

or ValueToCompare and SemanticItemToValidate 
according to the validator's Operator. 

2. If the test returns false, empty the text field 
of the SemanticItemToValidate and play the 

25 prompt. 

3. If the test returns true, mark the 
SemanticItemToValidate as validated by this 
validator. 

30 A CustomValidator is run as follows: 
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1. The ClientValidationFunction is called with the 
value of the SemanticItemToValidate . 

2. If the function returns false, the semanticltem 
cleared and the prompt is played, otherwise as 

5 validated by this validator. 



A Command is considered active if and only if: 
1. It is in Scope, AND 
10 2. There is not another Command of the same Type 

lower in the scope tree. 



In the multi-modal case, the logic is simplified to 
15 the following algorithm: 

1. Wait for triggering event - i.e., user tapping 
on a control; 

2. Collect expected answers; 

3. Listen in for input; 

20 4. Bind result to Semanticltem, or if none, throw 

event ; 
5 . Go back to 1 . 



FOCUS TRACKING 

25 The following discussion regarding focus 

tracking will be described with respect to use of 
Semanticltems as described above. However, it should 
be understood that this is but one embodiment and the 
techniques described below with regard to processing 

3 0 recognition results and maintaining focus on 
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information recently recognized or provided from the 
user can be applied to the other embodiments 
described above . 

The . foregoing algorithm for the voice -only 
5 scenario uses QA (Question-Answer) controls and the 
Semanticltem to formulate the dialogs. As described 
above each Semanticltem contains a recognition 
result, the confidence that the system has in it, and 
its current state. QA controls contain information, 

10 including prompts and grammars, that are used -to ask 
questions, recognize answers and update the 
Semanticltems . QA controls also contain answer and 
extra-answers objects that are used to specify the QA 
activation logic and the processing to be done with 

15 the results. Both answers and extra-answers take the 
recognition results returned by the speech recognizer 
and update Semanticltems with the values extracted 
from the recognition results. The difference between 
answers and extra-answers lies in the activation 

20 logic used by the system: if a Semanticltem already 
contains a value, the system will not process answers 
related to it. On the other hand, extra-answers can 
be activated irrespective of whether their related 
Semanticltem already contains a value or not. 

25 Although the foregoing algorithm works well in 

many applications, problems can arise as discussed in 
the Background Section when, for example, extra 
answers are being processed in a mixed-initiative 
dialogue with the user. An aspect of the present 

30 invention is to allow the system to automatically 
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adapt the dialogue flow so that it stays focused on 
the user's most recent input. In the algorithm 
discussed above, the dialogue flow is constrained by 
two main sources: the activation logic of the QA 
5 controls (based on the Semanticltem state and the 
QAs' answer/confirm/ extra-answer specification) and 
the speech index of the QA control. Generally, this 
aspect of the invention adds a third constraint, 
herein referred to as "focus". Whenever a 

10 Semanticltem is modified, which herein represents 
recently received recognition results, this 
information is retained in a manner so as to provide 
an order indicating when Semanticltems have been 
changed relative to each other. In this manner, the 

15 most recently changed Semanticltem can be identified. 
In one form, memory is used in the form of a "stack" , 
which is pictorially illustrated in FIG. 13 at 450. 
The stack 450 comprises identifiers such as 451, 452, 
453 and 454 of Semanticltems related to recognition 

20 results received. As a Semanticltem is changed 
through the receipt of recognition results, it is 
added to the stack 450. Then, when the RunSpeech 
algorithm looks for a suitable QA control to execute 
next, it will only consider QA controls that are 

25 related to the Semanticltem at the top of the stack 
450. This means that QAs that are later in speech 
index order than other QAs can be "promoted" and run 
before them, provided that they are related to the 
top-most Semanticltems whereas the others are not, 

30 and it is active. If no suitable QA can be found, the 
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stack 4 50 is "popped" or decremented and the 
RunSpeech algorithm searches for suitable QAs again. 
This process may repeat until the stack 450 is empty, 
in which case the RunSpeech algorithm acts with the 
5 usual (non-focused) behavior as described above. 

A system wide [rl3] "focussing" value can be used 
to identify if focussing is to be performed or not. 
With focussing, the general algorithm described above 
can be represented as follows, where other portions 
10 are the same. 

1. If focussing is desired and the stack is not 
null (indicating focused SemanticI terns [rl4] are 
present) , find the first active (as defined 

15 above) QA[rls] corresponding to the Semaniticltem 

at the top of the stack. 

Otherwise, if focussing is not desired or the 
stack is null (indicating no Semanticltems are 
20 present) , find the first active (as defined 

above) QA, CompareValidator or CustomValidator 
control in speech index order. 

2. If there is no active control, submit the page. 
25 3. Otherwise, run the control. 

Another way to describe this technique is that 
all QA controls that are not related to the 
Semanticltem at the top of the stack are removed and 
30 the active QA is then selected as usual. 
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As appreciated by those skilled in the art, the 
use of a stack is but one form to accomplish focusing 
as described above. Other forms of focussing or 
prioritizing which QA control will be executed in order 
5 to maintain focus include, but are not limited to, 
counters, pointers, indices, time stamps, etc. as well 
known to those skilled in the art. 

It should also be noted in a further embodiment, 
that the mechanism or manner in which the most recent 

10 Semanticltems have been saved need not ensure that they 
all be saved indefinitely. Rather, referring to FIG . 
13_where the stack 450, illustrated by way of example 
for storing such information, is some selected or 
finite length such that when the stack 450 is full and 

15 a further Semanticltem placed thereon, it causes the 
lowest or oldest Semanticltem to be pushed[RL6j off. 
This technique may be convenient in that the dialogue 
created thereby will not return to some item of 
information that the user spoke long ago. 

2 0 The stack or other form of memory is also 

accessible to the application author through program 
logic such that it can be erased or reset for example 
if the user moves on to a different subject where the 
information related to the semantic information will 
25 not be used. 

In one exemplary embodiment, the use of focus is 
controlled by a Boolean property on each QA control 
herein referred to as "focusing". If the focus property 
is set to true, the Semanticltems modified by that QA 

3 0 are put on the stack 450 and the focusing mechanism 
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operates until the stack 450 is emptied as described 
above, if a system wide focussing parameter is not 
present or is set for focussing. If the focus property- 
set to false, the Semanticltems are not put on the 
5 stack 450 and focusing does not take place, at least 
with respect to that Semanticltem (s) . 

An exemplary embodiment for activation logic 
with respect to determining if any Qas are related to 
recently received recognition results for the QA is 

10 illustrated in FIG. 14. Generally, at least one 
answer or confirm must be related to the focused 
Semanticltem. 

At step 460 the most recent Semanticltem is 
identified from the stack 450. At step 462, each QA 

15 control's corresponding answers are compared for a 
relation to the most recent Semanticltem. If a 
related answer is found, the QA control considered 
for execution. This is subject to the QA being active 
under the usual activation conditions (semantic item 

20 should be empty, etc.)[RL7] 

If at step 462 no QA control is found based on 
related answers, the method continues at step 464 
whereat each QA control's corresponding confirms are 
compared for a relation to the most recent 

25 Semanticltem. If a related confirm is found the QA 
control is considered for execution, [rls] 

If at step 464 no QA control is found based on 
related confirms, the method continues at step 4 66 
whereat the Semantic Item is removed from the stack. 
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If the stack is not empty or null, the method returns 
to step 460. 

In the embodiment described above, one 
Semanticltem at a time is added to the stack 4 50 when 
5 each recognition result is obtained. In another 
embodiment, each element 451, 452, 453 and 454 in the 
stack 450 can represent an array each array contains 
one or more Semanticltems . The method of activating a 
QA illustrated in FIG. 14 is essentially the same 

10 however, each Semanticltem in the array for each 
layer of the stack 4 50 is examined before the array 
is removed from the stack 450. 

In some cases a QA control may need to be played 
in speech index order, irrespective of being 

15 identified by the method of FIG. 14. In a further 
embodiment, a property is provided for each QA 
control specifying whether or not it should be 
included in the focusing mechanism, if identified. 

The following provides an example of how the 

2 0 focussing mechanism retains order or focus in a 
mixed-initiative travel dialogue requiring a 
departure city and a destination city. Suppose the 
question or prompt is rendered "What are your travel 
plans?". A user may answer "I'd like to go to 

25 Seattle." or "I'd like to leave from Paris". In the 
first case, the dialogue should go on with the 
confirmation: "Did you say Seattle?" and then ask for 
the departure city since this information was not 
given. 
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In the second case, the system should confirm 
the departure city first and then ask about the 
destination city second. If the user provided both 
the departure city and the destination city, the 
5 application author may want to confirm the departure 
city first and then the arrival city. Writing such a 
dialogue flow without using automated focusing is 
time consuming and error prone. Using the focusing 
mechanism, the application author does not have to 

10 worry about getting the ordering right [rls] . (However, 
the application author will still need to worry about 
some of the ordering (e.g., which city to confirm 
when both are given at the same time) although not as 
much as before.) If the user provides the departure 

15 city, the associated Semanticltem is pushed on the 
stack. When the control algorithm RunSpeech looks for 
a suitable QA control to execute, only QA controls 
related to that Semanticltem will be considered. In 
this case the QA confirming the departure city will 

20 be executed, even if the QA asking for the 
destination city comes earlier in speech index order. 

The focusing problem becomes quickly intractable 
if three or more pieces of information can be 
provided. In the example above, assume the airline to 

25 be used is also desired besides the departure and 
destination cities. A user may provide input to the 
initial question of travel plans by stating, "On 
United Airlines, I want to depart from Seattle", or 
"I want to fly to Minneapolis on American Airlines", 

30 or "I want to fly from Seattle to Chicago." Each of 
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these examples provides a different set of 
information. Trying to predict all possible dialogue 
flows and making sure that the system appropriately 
confirms (and asks again in case the user denies that 
5 the recognition result is correct) and asks for 
remaining required information in a logical manner is 
a very difficult task. However, this becomes 
straightforward using the focusing mechanism. 

10 ASSISTED MULT I -MODAL DIALOGUE 

The foregoing has provided separate 
algorithms for controlling dialogue in a voice-only 
scenario and in a mult i -modal scenario. Described 
below is a single algorithm that can be used to 

15 control the dialog in either the voice-only scenario 
or the mult i -modal scenario, which, among other 
benefits, would allow the user to easily switch 
between modes of operation for any page loaded on the 
client device. 

20 Currently, in the voice-only scenario, the 

algorithm starts once the page has been loaded on the 
client device, for example, the voice browser 216, 
and stops once all the information has been obtained. 
The page is then sent from the client device to the 

2 5 server 2 02. 

In a modified form discussed below, 
pressing on a textbox causes the algorithm to run a 
dialogue associated with that textbox, rather than 
starting at the beginning of the page. The dialogue 

30 can include focusing as described above. This gives 
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the user the ability to enter information as he/she 
desires. It should be noted in the multi-modal 
algorithm discussed earlier such a dialog was not 
run . 

5 FIG. 13 pictorially illustrates information 

to be gathered organized as "topics" in order to 
execute portions of the dialog. In the illustrative 
example, the topics pertain to a travel site that 
allows users to input information related to a 

10 departure city 500, a departure date and time 502, an 
arrival city 504 and an arrival date and time 506. 
Each topic 500, 502, 504 and 506 comprises a 
collection of one or more questions, answers, 
commands or validators such as illustrated in FIG. 11 

15 to form a dialog for each corresponding topic. Each 
of the collections 500, 502, 504 and 506 includes a 
label or identifier 500A, 502A, 504A and 506A, 
respectively. In addition, the collections can be 
grouped in two or more sets also identified by a 

20 label or identifier. In the illustrated example, a 
larger collection is identified as 508A 
(representative of the complete page) , which 
comprises the collections 500, 502, 504 and 506 as a 
group or hierarchy. 

25 Although the each of the collections 500, 

502, 504 and 506 can be constructed by using a 
combination of the various controls illustrated in 
FIG. 11 and discussed above, if desired, the controls 
can be further grouped together in a larger control 

30 forming a template. In this manner, an application 
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author can select the larger control and modify the 
individual controls therein as necessary depending 
upon the topic under development. Graphical user 
interfaces can be employed where the application 
5 author can select the desired controls and "drag and 
drop" on a panel to construct a topic. Such 
techniques are well-known in website design and other 
application development environments. 

Organization of the controls in this manner 

10 allows convenient execution of the dialog in both the 
voice-only scenario and the mult i -modal scenario. In 
particular, by using the collection identifiers 500A, 
502A, 504A, 506A and 508A the control algorithm can 
be instructed to execute the corresponding dialog for 

15 each collection 500, 502, 504 and 506, individually, 
or as a group 508. For example, a simple J-script 
command such as of the form: 

Runspeech. ActiveQA = (Collection Identifier) 

20 

can be used to identify collection and thus the 
corresponding dialog to execute. 

In the voice-only mode scenario, the 
"Collection Identifier" can be set to identifier 

25 508A, in which case the control algorithm will 
execute in the manner discussed above for the voice- 
only mode of operation to execute the complete 
dialog. However, it should be noted that in an 
alternative embodiment, a separate manager algorithm 

3 0 or module 512 can be used to individually and 
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sequentially activate each of the collections 500, 
502, 504 and 506. In the example illustrated in FIG. 
13, the manager algorithm 512 would issue a command 
identifying that collection 502 is active, whereupon 
5 after control algorithm executes the dialog of 
collection 502, control returns back to the manager 
algorithm 512. The manager algorithm 512 would then 
issue a command identifying that collection 504 is 
now active. This process is repeated until each of 

10 the collections 500, 502, 504 and 506 have been 
activated as prescribed by the manager algorithm 512. 

For the multi -modal application, the 
control algorithm maintains a list of those 
collections 500, 502, 504 and 506 that are considered 

15 active, for instance, a textbox on a page that needs 
input data. It should be noted in the illustrated 
embodiment, the control algorithm can ascertain the 
Semanticltems associated with each collection that 
needs to be filled. Whereupon, the data in the 

20 Semanticltem is used to fill in the corresponding 
textbox. 

FIG. 14 illustrates an exemplary display 
rendering 52 0 for the travel page. As data is 
entered, more collections may become active or may 

25 not be relevant and thus deactivated. As described 
above, each collection 500, 502, 504 and 506 has an 
associated identifier or label 500A, 502A, 504A and 
506A in the dialog. Each textbox in the page provided 
to the client also has an associated identifier or 

30 label (not shown) . When a textbox, or other button 



84 

such as button 522 associated with each textbox, is 
activated by the user, a simple function can then be 
called using the label or identifier of the textbox 
or other button 522 as an input parameter to identify 
5 the corresponding collection identifier that the 
selected textbox or button 522 is associated with. 
With the collection identifier determined, the 
control algorithm can be executed upon the dialog of 
the corresponding collection, for example, by issuing 

10 the J- script command provided above. 

Organization of the page into topics with 
corresponding dialogs identified by collection 
identifiers provides additional benefits not 
previously available in the multi -modal scenario. As 

15 stated above, the control algorithm in the multi- 
modal scenario begins execution of the dialog 
associated with the textbox that has been selected. 
Once the dialogue has been completed for the selected 
textbox or a group of textboxes, having obtained the 

20 speech input and performed any required validation or 
confirmation, the control algorithm stops, and 
possibly waits for further direction from the user. 
Alternatively, depending on the author of the 
application, an additional "stop" button 526 can be 

25 provided such that when a user initiates the stop 
button 52 6 the dialogue for the previously activated 
textbox would stop. In yet another embodiment, the 
user can provide an audible command such as "cancel" 
via a command control, which would also stop the 

3 0 control algorithm with respect to the textbox that 
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has been selected. Each of the afore -mentioned 
techniques, enable the application author to allow 
the user to selectively stop the dialog in the multi- 
modal scenario. Suitable mechanisms or methods can be 
5 provided in the control algorithm to selectively stop 
execution of the control algorithm based on user 
action. For example, an application program interface 
can be used to stop an executing topic and direct the 
control algorithm to begin execution of another 
10 topic. 

Another situation which may require the 
control algorithm to halt execution occurs when a 
confirmation prompt or other confirmation output is 
provided in the mult i -modal scenario. By contrast, in 

15 a voice-only browser once input speech is obtained 
from the user, the control algorithm may execute a 
confirmation control (automatically or based on a 
confidence threshold not being exceeded) , where the 
confirmation control audibly returns back to the user 

20 what has been recognized. The confirmation control 
then asks if what has been recognized is correct. The 
user answers whether or not it has been recognized as 
correct and the control algorithm proceeds to the 
next control such as the next question or QA control. 

25 However, in the multi-modal scenario, the 

input speech that has been recognized is generally 
rendered in the textbox that has been selected. 
Again, based upon automatic confirmation or whether 
or not a confidence threshold has been exceeded, a 

3 0 confirmation prompt can be displayed asking whether 



or not the input speech has been properly recognized. 
In FIG. 14, a confirmation prompt is illustrated at 
53 0 in response to a confidence threshold not being 
exceeded. In this example, the prompt 53 0 includes a 
5 "yes" button 532 and a "no" button 534. The 
confirmation prompt could be rendered audibly and/or 
the user's response could be provided by buttons 532 
and 534 or via speech input. 

At this point, it should be noted that 

10 displaying a prompt rather than audibly rendering the 
prompt when available through a mult i -modal capable 
device may be preferred depending on the data being 
rendered or the current operating environment. For 
example, audible rendering of sensitive data such as 

15 confirmation of a social security number may not be 
preferred in view that such information could 
possibly be overheard by others. Thus, parameters can 
be established and associated with information being 
gathered allowing the application author to designate 

20 that if said information is rendered it should be 
rendered as a default as a prompt if a multi -modal 
client is being used. Likewise, given a noisy 
operating environment, audible rendering of prompts 
may not be practical. Methods can be employed by the 

25 application author to enable the user to change the 
settings of these parameters, if desired. 

Also, audible rendering of prompts 
typically includes active monitoring of the length of 
silence that may result after rendering the audible 

30 prompt. If a sufficient amount of silence occurs 
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after playing the prompt, the user may not have heard 
the prompt and some methods are then employed to 
continue the dialog, for example, by replaying the 
prompt. However, when a prompt is displayed, such as 
5 the confirmation 530 of FIG. 14, the notion of time 
may be irrelevant. Thus, if a prompt is being 
displayed rather than being rendered audibly, it may 
be appropriate to disable or otherwise modify silence 
measuring methods occurring after rendering. In 

10 effect, the control algorithm could suspend the 
dialog after displaying a prompt, such as 
confirmation prompt 53 0, and begin again with the 
user selecting one of the buttons "yes" or "no" in 
the prompt 530. Depending on if any measurement of 

15 inactivity is being monitored by the control 
algorithm, activation of the buttons "yes" or "no" 
could employ a method to resume processing of the 
dialog by the control algorithm. 

Referring back to FIG. 14, if the user 

2 0 indicates that the recognition was correct, the 

recognized input can be rendered in the textbox, 
whereas if the user indicates that the recognition 
was incorrect, the recognized input would be 
discarded. However, in the multi -modal scenario, it 
25 is also desirable to allow the user to ignore 
answering the displayed or otherwise rendered 
confirmation question, and proceed to another textbox 
or operation with respect to the form provided on the 
client device. Thus, it is desirable to allow the 

3 0 control algorithm to stop execution of the portion of 
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the control algorithm with respect to the textbox 
selected if the user does not respond to the 
confirmation response. Depending on the author of the 
application, the author may choose to ignore the 
5 input speech provided to the textbox, or accept it 
with the intent of reconfirming its accuracy at a 
later time. 

In many applications, it is common to 
include a "submit" button 532 when the user believes 

10 that he/she has entered all required information. As 
indicated above, the user may not have chosen to 
confirm that a particular value was properly 
recognized for a given textbox; however, upon 
activation of the submit button 532 the author of the 

15 application may at that time force confirmation of 
the previously ignored confirmation prompt, or choose 
to accept the recognized input as correct and return 
the form to the server for processing. 

Another improvement for the mult i -modal 

2 0 scenario allows the user to change the value for the 
textbox either through voice recognition or through 
standard graphical user interface methods such as 
with a keyboard, a handwriting recognition, etc. 
Using voice recognition, the control algorithm 

25 initially receives input speech and ascertains a 
recognized input for the input speech. This 
recognized input is retained by the control algorithm 
in memory, for example in the corresponding 
Semanticltem discussed above along with status 

30 information such as whether or not the recognized 
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input has been confirmed. At sometime prior to 
submission of the page to the web server, the 
recognized input is associated with the primary 
control. However, in the multi-modal scenario, the 
5 user also has the option of entering the value 
through a graphical user interface rather than using 
voice recognition. If the user chooses to input a 
value using a graphical user interface, the value 
entered by the user is also replicated in the 

10 corresponding Semanticltem where the status 
information, for example, can be considered as 
confirmed because the user manually entered the 
assumed correct value. The Semanticltems and the 
status information is exposed through suitable 

15 methods to allow the author when writing the 
application to update these values if the user of the 
client device chooses to use a graphical user 
interface to input values. Likewise, if the user has 
provided input speech for a particular entry which 

20 was recognized and displayed back to the user, using 
the graphical user interface, the user can correct 
the value rather than using the confirmation control . 
Again, by exposing the Semanticltems and status 
information maintained by the control algorithm, the 

25 control algorithm and status information can be 
maintained in correspondence. 

Exposing the Semanticltems and the status 
information, also allows the application author to 
reset selected portions of the dialog information as 

30 desired during execution. For instance, a reset 
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button can be provided for the each portion of the 
dialog separately based on the collections 500, 502, 
504 and 506. These buttons are indicated at 550, 552, 
554 and 556, respectively. If the user, activates any 
5 of the buttons 550, 552, 554 or 556, the control 
algorithm can ascertain which collection has been 
identified for resetting using a simple function (in 
a manner similar to that discussed above with respect 
to identifying textboxes) and then remove the 

10 corresponding values in the associated Semanticltems 
for that collection and/or change the • status 
information to signify that new data is required. 
Although in FIG. 14use of the reset buttons 550, 552, 
554 and 556 may appear simplistic, it should be noted 

15 the embodiment illustrated is merely exemplary, and 
that the concept can be applied to more complex data 
entry forms where resetting would involve more 
textboxes or complexity. 

In the multi-modal scenario where the user 

20 has the ability to select a textbox for entry of 
data, typically the user will know initially what the 
form is asking for due to the fact that there is a 
visual prompt associated with the textbox. For 
example, it is clear in FIG. 14that textbox 536 is 

25 for the "Departure City". Thus, although the control 
algorithm is running the dialogue associated with the 
textbox in the mult i -modal scenario, there is 
probably no need to ask the initial prompt in the 
collection 500, as would be necessary in the voice- 

30 only scenario, since the user probably is provided 
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with a visual prompt. However, it is also desirable 
that the page be renderable in either a mult i -modal 
scenario or a voice-only scenario. Therefore, the 
control algorithm can maintain in memory status 
5 information as to whether or not the user is 
operating in a multi -modal scenario, or whether the 
page is being rendered in a voice-only scenario such 
as through voice browser 216. In one embodiment, 
information can be easily ascertained by the client 

10 device upon receipt of the form to determine the 
appropriate mode of entry. For example, if the client 
device is a voice browser 216, the mode of entry 
needs to be operation of the control algorithm in a 
voice-only mode. The voice-only mode is particularly 

15 suitable for focusing as described above. However, if 
the client device is a multi -modal device such as a 
PDA, then the mode of entry can be optionally 
defaulted to a preselected mode of entry as well as 
be changed such as through activation of a button 

20 540. 

Being able to switch the mode of entry for 
a page rendered on the client device can be very 
advantageous. For instance, if the user is entering 
data in a multi-modal scenario by selecting textboxes 

25 and providing voice or speech input which is 
recognized and displayed in the corresponding 
textboxes, the user may want to switch to a voice- ^ 
only mode of entry, for instance, if the user needs 
to perform another activity which prevents him/her 

3 0 from selecting textboxes for entry such a case when / 
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the user may be driving. Button 540 or other suitable 
indicating device allows the mode of entry to be 
changed. If the user has been operating in a multi- 
modal for of entry, the control algorithm can then 
5 begin processing the page, for example, at the 
beginning and execute the remaining dialog with voice 
prompts until the form is completed. Switching to the 
situation using voice prompts for questions and 
confirmations (although the system would otherwise 

10 display the results) would be another form in which 
focusing may be desired. If, on the other hand, the 
user has been operating in voice-only operation, upon 
activation of a button on the display or other form 
of user input such as a keyword spoken by the user, 

15 or merely selecting a textbox, the control algorithm 
can then switch to a mult i -modal for of entry 
allowing the user to navigate through the page as 
desired. In such a case, the stack or other memory 
mechanism retaining focussing information can then be 

20 reset since the user will be generally identifying 
textboxes to which spoken input will be directed. It 
should be noted that switching from a voice-only mode 
to a multi-modal mode, or vice-versa, can be for the 
whole page as described above, or for just a portion 

25 of the dialog associated with one or more textboxes 
or other required input . 

As indicated above, for a multi -modal form, 
the control algorithm can automatically determine 
that the initial prompt associated with any textbox 

3 0 not be played in view of the fact that a visual 
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prompt is probably already associated with the 
textbox selected by the user. However, if upon 
selection, the user does not provide any voice input, 
the client device can then play the initial prompt. 
5 However, depending upon the application, there may be 
other audible prompts that should be played depending 
upon, for example, entry of incorrect information. 
For instance, if the user has entered a credit card 
number of insufficient length, the control algorithm 

10 operating in the multi-modal operation, would then 
play or otherwise render a prompt stating that the 
credit card number was of insufficient length and 
should contain a given amount of digits. Thus, 
although the initial prompt may not be played in a 

15 multi-modal scenario, other prompts associated with 
the dialog for the selected textbox may need to be 
rendered. Flexibility is provided for the application 
author to enable the subsequent prompts to be 
rendered audibly, visually or both audibly and 

20 visually, as desired and depending on the 
capabilities of the device. 

From the foregoing, a method and system are 
provided for generating mark-up for client side 
devices for speech-enabled applications including 

25 telephony applications that further provides 
focussing. The same set of controls can be used in 
three different forms of interaction including Voice- 
only, Tap-and-talk (multi-modal) and Hands-free 
(mult i -modal) . In Voice-only dialogs are provided on 

30 a GUI -less browser such as for telephony 
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applications. This kind of application is driven by a 
dialog- flow manager that runs on the client 
(RunSpeech) . In Tap-and-talk multi-modal dialogs 
contain a usable GUI without speech output. System 
5 prompts are generally not provided and the 
interaction is managed by the user's click events on 
the GUI. In Hands-free multi-modal, dialogs use a GUI 
display and speech input and output. The dialog may 
be authored for Tap-and-talk, but may still use the 

10 RunSpeech algorithm, or other speech controls 
features, to enable system driven voice prompting, 
while confirmation is provided visually or aurally, 
depending on the active mode of input at the time of 
confirmation. Switching between multi-modal/hands-free 

15 and voice-only is done by detecting the type of 
client the controls are talking to. Generally, Hands- 
free is switched on on-demand. 

Although the present invention has been 
described with reference to preferred embodiments, 

20 workers skilled in the art will recognize that changes 
may be made in form and detail without departing from 
the spirit and scope of the invention. 
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APPENDIX A 

1 QA speech control 

The QA control adds speech functionality to the primary- 
control to which it is attached. Its object model is an 
5 abstraction of the content model of the exemplary tags in 
Appendix A. 

1.1 QA control 

<Speech :QA 
id="..." 

10 controlsToSpeechEnable = "..." 

speechlndex="..." 
CI ientTest = "..." 
runat=" server" > 

15 <Question ...> 

<Statement ...> 

<Answer ...> 

20 

< Confirm ...> 

< Command ...> 
25 </Speech:QA> 

1.1.1 Core properties 

string ControlsToSpeechEnable 

ControlsToSpeechEnable specifies the list of IDs of the 
primary controls to speech enable. IDs are comma delimited. 

30 1.1.2 Activation mechanisms 

int Speechlndex 

Speechlndex specifies the ordering information of the QA 
control - this is used by RunSpeech. Note: If more than one 
QA control has the same Speechlndex, RunSpeech will execute 
35 them in source order. In situations where some QA controls 
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have Speechlndex specified and some QA controls do not, 
RunSpeech will order the QA controls first by Speechlndex, 
then by source order. 

5 string ClientTest 

ClientTest specifies a client-side script function which 
returns a boolean value to determine when the QA control is 
considered available for selection by the RunSpeech 
algorithm. The system strategy can therefore be changed by 
10 using this as a condition to activate or de-activate QA 
controls more sensitively than Speechlndex. If not 
specified, the QA control is considered available for 
activation . 

1.1.3 Questions, Statements, Answers, Confirms and 

15 Commands 

Question [] Questions 

QA control contains an array of question objects or 
controls, defined by the dialog author. Each question 
control will typically relate to a function of the system, 

20 eg asking for a value, etc. Each question control may 
specify an activation function using the ClientTest 
attribute, so an active QA control may ask different kinds 
of questions about its primary control under different 
circumstances. For example, the activation condition for 

25 main question Q_Main may be that the corresponding primary 
control has no value, and the activation condition for a 
Q_GiveHelp may be that the user has just requested help. 
Each Question may specify answer controlss from within the 
QA control which are activated when the question control is 

3 0 outputted. 
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Statement [] Statement 

QA control contains an array of statement objects or 
controls. Statements are used to provide information to the 
listener, such as welcome prompts. 

5 

Answer [ ] Answers 

QA control contains an array of answer objects or controls. 
An answer control is activated directly by a question 
control within the QA control, or by a StartEvent from the 
10 Primary control. Where multiple answers are used, they will 
typically reflect answers to the system functions, e.g. 
A_Main might provide a value in response to Q_Main, and 
A__Confirm might providing a yes/no + correction to Confirm. 

15 Confirm [] Confirm 

QA control may contain a confirm object or control. This 
object is a mechanism provided to the dialog authors which 
simplify the authoring of common confirmation subdialogs. 

2 0 Command [ ] Command 

A Command array holds a set of command controls. Command 
controls can be thought of as answer controls without 
question controls, whose behavior on recognition can be 
scoped down the control tree. 

25 1.2 Question control 

The question control is used for the speech output relating 
to a given primary control. It contains a set of prompts 
for presenting information or asking a question, and a list 
of ids of the answer controls, which may provide an answer 
30 to that question. If multiple answer controls are 
specified, these grammars are loaded in parallel when the 
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question is activated. An exception will be thrown if no 
answer control is specified in the question control. 

<Question 
5 id="..." 

ClientTest = "..." 
Answers = "..." 
Count = "..." 

initialTimeout = "..." 
10 babbleTimeout ="..." 

maxT i me ou t = " ... " 
Modal = "..." 

Prompt Funct ion=" ..." 
OnCl ientNoReco=" ..." > 

15 

<prompt ... / > 
</Question> 

20 string ClientTest 

ClientTest specifies the client-side script function 
returning a boolean value which determines under which 
circumstances a question control is considered active 
within its QA control (the QA control itself must be active 

25 for the question to be evaluated) . For a given QA control, 
the first question control with a true condition is 
selected for output. For example, the function may be used 
to determine whether to output a question which asks for a 
value ("Which city do you want?") or which attempts to 

30 confirm it ("Did you say London?"). If not specified, the 
question condition is considered true. 
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Prompt[] Prompts 

The prompt array specifies a list of prompt objects, 
discussed below. Prompts are also able to specify 
conditions of selection (via client functions) , and during 
5 RunSpeech execution only the first prompt with a true 
condition is selected for playback. 

String Answers 

Answers is an array of references by ID to controls that 
10 are possible answers to the question. The behavior is to 
activate the grammar from each valid answer control in 
response to the prompt asked by the question control. 

Integer initialTimeout 
15 The time in milliseconds between start of recognition and 
the detection of speech. This value is passed to the 
recognition platform, and if exceeded, an onSilence event 
will be thrown from the recognition platform. If not 
specified, the speech platform will use a default value. 

20 

Integer babbleTimeout 

The period of time in milliseconds in which the recognition 
server or other recognizer must return a result after 
detection of speech. For recos in "tap-and-talk" scenarios 

25 this applies to the period between speech detection and the 
recognition result becoming available. For recos in 
dictation scenarios, this timeout applies to the period 
between speech detection and each recognition return - i.e. 
the period is restarted after each return of results or 

3 0 other event. If exceeded, the onClientNoReco event is 
thrown but different status codes are possible. If there 
has been any kind of recognition platform error that is 
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detectable and the babbleTimeout period has elapsed, then 
an onClientNoReco is thrown but with a status code -3. 
Otherwise if the recognizer is still processing audio - 
e.g. in the case of an exceptionally long utterance or if 
5 the user has kept the pen down for an excessive amount of 
time - the onClientNoReco event is thrown, with status code 
-15. If babbleTimeout is not specified, the speech 
platform will default to an internal value. 

10 Integer maxTimeout 

The period of time in milliseconds between recognition 
start and results returned to the client device browser. If 
exceeded, the onMaxTimeout event is thrown by the browser - 
this caters for network or recognizer failure in 

15 distributed environments. For recos in dictation scenarios, 
as with babbleTimeout, the period is restarted after the 
return of each recognition or other event. Note that the 
maxTimeout attribute should be greater than or equal to the 
sum of initialTimeout and babbleTimeout. If not specified, 

20 the value will be a browser default. 

bool modal 

When modal is set to true, no answers except the immediate 
set of answers to the question are activated (i.e. no 
25 scoped Answers are considered). The defaults is false. For 
Example, this attribute allows the application developer to 
force the user of the client device to answer a particular 
question. 

3 0 String Prompt Function ( prompt) 

Prompt Function specifies a client-side function that will 
be called once the question has been selected but before 
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the prompt is played. This gives a chance to the 
application developer to perform last minute modifications 
to the prompt that may be required. PromptFunction takes 
the ID of the target prompt as a required parameter. 



string OnClientNoReco 
OnClientNoReco specifies 
function to call when 
received. 

10 1.2.1 Prompt Object 

The prompt object contains information on how to play 

prompts. All the properties defined are read/write 

properties . 

<prompt 
15 id="..." 

count = "..." 

ClientTest = "..." 

source = "..." 

bargeln= // ..." 
20 onClientBargein= // ..." 

onClientComplete="..." 

onCl ientBookmark= " ..." > 

. . .text/markup of the prompt. . . 
< /prompt > 

25 

int count 

Count specifies an integer which is used for prompt 
selection. When the value of the count specified on a 
prompt matches the value of the count of its question 
30 control, the prompt is selected for playback. Legal values 
are 0 - 100. 



<Question id=Q_Ask"> 

<prompt count=" 1 M > Hello </prompt> 
35 <prompt count ="2" > Hello again </prompt> 

</Question> 



the name of the client -side 
the NoReco (mumble) event is 
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In the example, when Q__Ask . count is equal to 1, the first 
prompt is played, and if it is equal to 2 (i.e. the 
question has already been output before) , the second prompt 
5 is then played. 

string ClientTest 

ClientTest specifies the client-side script function 
returning a boolean value which determines under which 

10 circumstances a prompt within an active question control 
will be selected for output. For a given question control, 
the first prompt with a true condition is selected. For 
example, the function may be used to implement prompt 
tapering, eg ("Which city would you like to depart from?" 

15 for a function returning true if the user if a first-timer, 
or "Which city?" for an old hand) . If not specified, the 
prompt's condition is considered true. 

string InlinePrompt 

2 0 The prompt property contains the text of the prompt to 
play. This is defined as the content of the prompt element. 
It may contain further markup, as in TTS rendering 
information, or <value> elements. As with all parts of the 
page, it may also be specified as script code within 

25 <script> tags, for dynamic rendering of prompt output. 

string Source 

Source specifies the URL from which to retrieve the text of 
the prompt to play. If an inline prompt is specified, this 
30 property is ignored. 
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Bool Bargeln 

Bargeln is used to specify whether or not barge-in (wherein 
the user of the client device begins speaking when a prompt 
is being played) is allowed on the prompt. The defaults is 
5 true . 

string onClientBargein 

onClientBargein specifies the client-side script function 
which is invoked by the bargein event. 

10 

string onClientComplete 

onClientComplete specifies the client-side script function 
which is invoked when the playing of the prompt has 
competed. 

15 

string OnClientBookmark 

OnClientBookmark accesses the name of the client-side 
function to call when a bookmark is encountered. 

1.2.2 Prompt selection 

20 On execution by RunSpeech, a QA control selects its prompt 
in the following way: 

ClientTest and the count attribute of each prompt are 
evaluated in order. The first prompt with both ClientTest 
and count true is played. A missing count is considered 
25 true. A missing ClientTest is considered true. 

1.3 Statement Control 

Statement controls are used for information-giving system 
output when the activation of grammars is not required. 
This is common in voice-only dialogs. Statements are played 
30 only once per page if the playOnce attribute is true. 



-104- 



<Statement 
id="„." 

playOnce="..." 
5 ClientTest = "..." 

Prompt Func t i on= " ... " > 
<prompt ... / > 

</Statement > 

10 

bool playOnce 

The playOnce attribute specifies whether or not a statement 
control may be activated more than once per page. playOnce 
is a Boolean attribute with a default (if not specified) of 

15 TRUE, i.e., the statement control is executed only once. 
For example, the playOnce attribute may be used on 
statement controls whose purpose is to output email 
messages to the end user. Setting playOnce=" False" will 
provide dialog authors with the capability to enable a 

20 "repeat" functionality on a page that reads email messages. 

string ClientTest 

ClientTest specifies the client-side script function 
returning a boolean value which determines under which 
25 circumstances a statement control will be selected for 
output. RunSpeech will activate the first Statement with 
ClientTest equal to true. If not specified, the ClientTest 
condition is considered true. 

3 0 String PromptFunction 

Prompt Function specifies a client-side function that will 
be called once the statement control has been selected but 
before the prompt is played. This gives a chance to the 
authors to do last minute modifications to the prompt that 

35 may be required. 
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Prompt [] Prompt 

The prompt array specifies a list of prompt obj ects . 
Prompts are also able to specify conditions of selection 
5 (via client functions) , and during RunSpeech execution only 
the first prompt with a true condition is selected for 
playback. 

<Speech:QA 
10 i d= " QA_We 1 come " 

ControlsToSpeechEnable=" Label 1" 
runat=" server" > 

<Statement id="WelcomePrompt" > 
15 <prompt bargeln=" False" > Welcome </prompt> 

</Statement> 
</Speech:QA> 

1.4 Confirm Control 

20 Confirm controls are special types of question controls. 
They may hold all the properties and objects of other 
questions controls, but they are activated differently. The 
RunSpeech algorithm will check the confidence score found 
in the conf irmThreshold of the answer control of the 

25 ControlsToSpeechEnable . If it is too low, the confirm 
control is activated. If the confidence score of the answer 
control is below the conf irmThreshold, then the binding is 
done but the onClientReco method is not called. The dialog 
author may specify more than one confirm control per QA 

30 control. RunSpeech will determine which confirm control to 
activate based on the function specified by ClientTest. 



35 



<Answer Conf irmThreshold=... /> 
<Conf irm> 

...all attributes and objects of Question... 
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</Conf irm> 
1 . 5 Answer control 

The answer control is used to specify speech input 
resources and features. It contains a set of grammars 
5 related to the primary control . Note that an answer may be 
used independently of a question, in multimodal 
applications without prompts, for example, or in telephony 
applications where user initiative may be enabled by extra- 
answers. Answer controls are activated directly by question 
10 controls, by a triggering event, or by virtue of explicit 
scope. An exception will be thrown if no grammar object is 
specified in the answer control. 



< Answer 
15 id="..." 

scope="..." 
StartEvent = "..." 
StopEvent = "..." 
ClientTest="..." 

2 0 onCl ientReco= " ..." 

onClientDTMF="..." 

autobind="..." 

server="..." 

Conf irmThreshold="..." 
25 RejectThreshold="..." > 

<grammar ... /> 
<grammar ... /> 

3 0 <dtmf ... /> 

<dtmf ... / > 

<bind ... / > 
<bind ... / > 



35 



< /Answer > 



-107- 



string Scope 

Scope holds the id of any named element on the page. Scope 
is used in answer control for scoping the availability of 
user initiative (mixed task initiative: i.e. service jump 
5 digressions) grammars. If scope is specified in an answer 
control, then it will be activated whenever a QA control 
corresponding to a primary control within the subtree of 
the contextual control is activated. 

10 string StartEvent 

StartEvent specifies the name of the event from the primary 
control that will activate the answer control (start the 
Reco object) . This will be typically used in multi-modal 
applications, eg onMouseDown, for tap-and-talk. 

15 

string StopEvent 

StopEvent specifies the name of the event from the primary 
control that will de-activate the answer control (stop the 
Reco object) . This will be typically used in multi -modal 
20 applications, eg onMouseUp, for tap-and-talk. 

string ClientTest 

ClientTest specifies the client-side script function 
returning a boolean value which determines under which 

25 circumstances an answer control otherwise selected by scope 
or by a question control will be considered active. For 
example, the test could be used during confirmation for a 
'correction' answer control to disable itself when 
activated by a question control, but mixed initiative is 

30 not desired (leaving only accept /deny answers controls 
active) . Or a scoped answer control which permits a service 
jump can determine more flexible means of activation by 
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specifying a test which is true or false depending on 
another part of the dialog. If not specified, the answer 
control's condition is considered true. 

5 Grammar [ ] Grammars 

Grammars accesses a list of grammar objects. 

DTMF [ ] DTMFs 

DTMFs holds an array of DTMF objects. 

10 

Bind[] Binds 

Binds holds a list of the bind objects necessary to map the 
answer control grammar results (dtmf or spoken) into 
control values. All binds specified for an answer will be 
15 executed when the relevant output is recognized. If no bind 
is specified, the SML output returned by recognition will 
be bound to the control specified in the 
ControlsToSpeechEnable of the QA control 

20 string OnClientReco 

OnClientReco specifies the name of the client-side function 
to call when spoken recognition results become available. 

string OnClientDTMF 
25 OnClientDTMF holds the name of the client-side function to 
call when DTMF recognition results become available. 

boolean autobind 

The value of autobind determines whether or not the system 
30 default bindings are implemented for a recognition return 
from the answer control. If unspecified, the default is 



-109- 



true. Setting autobind to false is an instruction to the 
system not to perform the automatic binding. 

string server 

5 The server attribute is an optional attribute specifying 
the URI of the speech server to perform the recognition. 
This attribute over-rides the URI of the global speech 
server attribute. 

10 integer Conf irmThreshold 

Holds a value representing the confidence level below which 
a confirm control question will be automatically triggered 
immediately after an answer is recognized within the QA 
control. Legal values are 0-100. 

15 

Note that where bind statements and onClientReco scripts 
are both specified, the semantics of the resulting Tags are 
that binds are implemented before the script specified in 
onClientReco . 

20 

integer Rej ectThreshold 

RejectThreshold specifies the minimum confidence score to 
consider returning a recognized utterance. If overall 
confidence is below this level, a NoReco event will be 
25 thrown. Legal values are 0-100. 

1.5.1 Grammar 

The grammar object contains information on the selection 
and content of grammars, and the means for processing 
recognition results. All the properties defined are 
30 read/write properties. 
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< Grammar 

ClientTest = "..." 
Source = "..." 
> 

5 ...grammar rules... 

</Grammar> 

string ClientTest 

The ClientTest property references a client-side boolean 
10 function which determines under which conditions a grammar 
is active. If multiple grammars are specified within an 
answer control (e.g. to implement a system/mixed initiative 
strategy, or to reduce the perplexity of possible answers 
when the dialog is going badly) , only the first grammar 
15 with a true ClientTest function will be selected for 
activation during RunSpeech execution. If this property is 
unspecified, true is assumed. 

string Source 

20 Source accesses the URI of the grammar to load, if 
specified. 

string InlineGrammar 

InlineGrammar accesses the text of the grammar if specified 
25 inline. If that property is not empty, the Source attribute 
is ignored. 

1.5.2 Bind 

The object model for bind follows closely its counterpart 
client side tags. Binds may be specified both for spoken 
3 0 grammar and for DTMF recognition returns in a single answer 
control . 
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<bind 

Value="..." 
TargetElement ="..." 
5 Targe tAttribute="... " 

Test="..." 
/> 

string Value 

10 Value specifies the text that will be bound into the target 
element. It is specified as an XPath on the SML output from 
recognition. 

string TargetElement 
15 TargetElement specifies the id of the primary control to 
which the bind statement applies. If not specified, this is 
assumed to be the Control sToSpeechEnable of the relevant QA 
control . 

20 string TargetAttribute 

TargetAttribute specifies the attribute on the 
TargetElement control in which bind the value. If not 
specified, this is assumed to be the Text property of the 
target element. 

25 

string Test 

The Test attribute specifies a condition which must 
evaluate to true on the binding mechanism. This is 
specified as an XML Pattern on the SML output from 
30 recognition. 



-112- 



1.5.2.1 Automatic binding 

The default behavior on the recognition return to a speech- 
enabled primary control is to bind certain properties into 
that primary control. This is useful for the dialog 
5 controls to examine the recognition results from the 
primary controls across turns (and even pages) . Answer 
controls will perform the following actions upon receiving 
recognition results: 

10 1 . bind the SML output tree into the SML attribute of the 

primary control 

2 . bind the text of the utterance into the SpokenText 
attribute of the primary control 

3 . bind the confidence score returned by the recognizer 
15 into the Confidence attribute of the primary control. 

Unless autobind=" False" attribute is specified on an answer 
control, the answer control will perform the following . 
actions on the primary control: 

20 

1. bind the SML output tree into the SML attribute; 

2 . bind the text of the utterance into the SpokenText 
attribute; 

3 . bind the confidence score returned by the recognizer 
25 into the Confidence attribute; 

Any values already held in the attribute will be 
overwritten. Automatic binding occurs before any author- 
specified bind commands, and hence before any onClientReco 
30 script (which may also bind to these properties) . 
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1.5.3 DTMF 

DTMF may be used by answer controls in telephony 
applications. The DTMF object essentially applies a 
different modality of grammar (a keypad input grammar 
5 rather than a speech input grammar) to the same answer. The 
DTMF content model closely matches that of the client side 
output Tags DTMF element. Binding mechanisms for DTMF 
returns are specified using the targetAttribute attribute 
of DTMF object. 

10 

<DTMF 

f irstTimeOut = "..." 
interDigitTimeOut = "..." 
numDigits = "..." 
15 flush="..." 

escape="..." 

targe tAttribute="..." 

ClientTest = "..."> 

20 <dtmf Grammar ...> 

</DTMF> 

integer firstTimeOut 

The number of milliseconds to wait between activation and 
25 the first key press before raising a timeout event. 

integer interDigitTimeOut 

The number of milliseconds to wait between key presses 
before raising a timeout event. 

30 

int numDigits 

The maximum number of key inputs permitted during DTMF 
recognition. 
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bool flush 

A flag which states whether or not to flush the telephony 
server's DTMF buffer before recognition begins. Setting 
flush to false permits DTMF key input to be stored between 
5 recognition/page calls, which permits the user to 'type- 
ahead' . 

string escape 

Holds the string value of the key which will be used to end 
10 DTMF recognition (eg x #'). 

string target At tribute 

TargetAttribute specifies the property on the primary 
control in which to bind the value. If not specified, this 
15 is assumed to be the Text property of the primary control. 

string ClientTest 

The ClientTest property references a client-side boolean 
function which determines under which conditions a DTMF 
20 grammar is active. If multiple grammars are specified 
within a DTMF object, only the first grammar with a true 
ClientTest function will be selected for activation during 
RunSpeech execution. If this property is unspecified, true 
is assumed. 

25 1.5.4 DTMFGrammar 

DTMFGrammar maps a key to an output value associated with 
the key. The following sample shows how to map the "1" and 
"2" keys to text output values. 



30 



<dtmf grammar > 

<key value="l">Seattle</key> 
<key value="2" >Boston</key> 
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< /dtmf grammar > 
1 . 6 Command control 

The command control is a special variation of answer 
control which can be defined in any QA control . Command 
5 controls are forms of user input which are not answers to 
the question at hand (eg, Help, Repeat, Cancel), and which 
do not need to bind recognition results into primary 
controls. If the QA control specifies an activation scope, 
the command grammar is active for every QA control within 

10 that scope. Hence a command does not need to be activated 
directly by a question control or an event, and its 
grammars are activated in parallel independently of answer 
controls building process. Command controls of the same 
type at QA controls lower in scope can override superior 

15 commands with context-sensitive behavior (and even 
different / extended grammars if necessary) . 



< Command 

id="..." 

20 scope="..." 

type="..." 

RejectThreshold= ,, ..." 
onCl ientReco=" ..." > 

25 <Grammar ...> 

<dtmf ... > 

</ Command > 

3 0 string Scope 

Scope holds the id of a primary control. Scope is used in 
command controls for scoping the availability of the 
command grammars. If scope is specified for a command 
control, the command's grammars will be activated whenever 
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a QA control corresponding to a primary control within the 
subtree of the contextual control is activated. 

string type 

5 Type specifies the type of command (eg 'help' , 'cancel' 
etc.) in order to allow the overriding of identically typed 
commands at lower levels of the scope tree. Any string 
value is possible in this attribute, so it is up to the 
author to ensure that types are used correctly. 

10 

integer Re j ectThreshold 

Rej ectThreshold specifies the minimum confidence level of 
recognition that is necessary to trigger the command in 
recognition (this is likely to be used when higher than 
15 usual confidence is required, eg before executing the 
result of a 'Cancel' command). Legal values are 0-100. 

string onClientReco 

onCommand specifies the client-side script function to 
20 execute on recognition of the command control's grammar. 

Grammar Grammar 

The grammar object which will listen for the command. 



25 DTMF DTMF 

The dtmf object which will activate the command. 
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2 Types of Initiatives and Dialog Flows 



Using the control described above, various forms of 
initiatives can be developed, some examples are provided 
5 below: 

2.1 Mixed initiative Dialogs 

Mixed initiative dialogs provide the capability of 
accepting input for multiple controls with the asking of a 
single question. For example, the answer to the question 
10 "what are your travel plans" may provide values for an 
origin city textbox control, a destination city textbox 
control and a calendar control ("Fly from Puyallup to 
Yakima on September 3 0 th ") . 

15 A robust way to encode mixed initiative dialogs is to 
handwrite the mixed initiative grammar and relevant binding 
statements, and apply these to a single control. 

The following example shows a single page used for a simple 
20 mixed initiative voice interaction about travel. The first 
QA control specifies the mixed initiative grammar and 
binding, and a relevant prompt asking for two items. The 
second and third QA controls are not mixed initiative, and 
so bind directly to their respective primary control by 
25 default (so no bind statements are required) . The RunSpeech 
algorithm will select the QA controls based on an attribure 
"Speechlndex" and whether or not their primary controls 
hold valid values. 

30 <%@ Page language ="c# n AutoEventWireup=" false" 
inherit s=" SDN. Page" %> 
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<%@ Register tagPref ix="SDN" Namespace= n SDN n Assernbly= " SDN" 
%> 

<html> 
5 <body> 

<Form id= "WebForml " method=post runat=" server" > 
<ASP : Label id= " Label 1 " runat = " server " >Departure 
c i t y < /AS P : Labe 1 > 

<ASP:TextBox id="TextBoxl" runat =" server " /> 
10 <br> 

<ASP : Label id="Label2 11 runat =" server" >Arrival 
city</ASP :Label> 

<ASP : TextBox id= " TextBox2 " text changed= " Text Changed " 
runat = " server " / > 

15 

<!— speech information --> 

<Speech:QA id="QAmixed" control sToSpeechEnable="TextBoxl" 
speechlndex=" 1" runat =" server" > 
20 <Question id="Ql" Answers =" Al " > 

<prompt>" Please say the cities you want to fly 
from and to" < /prompt > 
</Question> 

25 < Answer id="Al' / > 

<grammar src="..." / > 
<bind targetElement = "TextBoxl ,/ 
value="/sml/pathl ,/ /> 

<bind targetElement="TextBox2" 
30 value= ,, /sml/path2 // /> 
< /Answer > 
</Speech:QA> 



35 <Speech:QA id="QAl" controlsToSpeechEnable= /, TextBoxl" 
speechlndex=" 2 " runat = " server " > 

<Question id= / 'Ql" Answers = "Al ,/ > 

<prompt>"What ' s the departure city?' 7 </prompt> 
</Question> 

40 

<Answer id= ,, Al // > 

<grammar src= /, ...' / /> 
</Answer> 
</Speech:QA> 

45 

<Speech:QA id="QA2" controlsToSpeechEnable="TextBox2" 
speechlndex=" 3" runat =" server" > 
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<Question id="Ql" Answer="Al"> 

<prompt>" What ' s the arrival city" </prompt> 
</Question> 

5 < Answer id="Al" > 

<grammar src = "..." / > 
</Answer> 
</Speech :QA> 

10 </Form> 
</body> 
</html> 

2.2 Complex Mixed Initiative 

Application developers can specify several answer to the 
15 same question control with different levels of initiatives. 
Conditions are specified that will select one of the 
answers when the question is asked, depending on the 
initiative settings that they require. An example is 
provided below: 



20 



25 



< Speech :QA 

id="QA_Panel2" 

ControlsToSpeechEnable="Panel2" 
runat=" server" > 



<Question answers=" systemlnitiative, 

mixedlnitiative" .../> 

<Answer id=" systemlnitiative" 

ClientTest=" systemlnitiat iveCond" 
30 onClientReco="SimpleUpdate" > 

<grammar src=" systemlnitiative .gram" /> 

< /Answer > 

<Answer id=" mixedlnitiative" 

ClientTest="mixedInitiativeCond" 
3 5 onClientReco="MixedUpdate" > 

<grammar src="mixedlnitiative . gram" /> 

</Answer> 
</Speech:QA> 



40 Application developers can also specify several question 
controls in a QA control. Some question controls can allow 
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a mixed initiative style of answer, whilst others are more 
directed. By authoring conditions on these question 
controls, application developer can select between the 
questions depending on the dialogue situation. 



In the following example the mixed initiative question asks 
the value of the two textboxes at the same time (e.g., 
'what are your travel plans?') and calls the mixed 
initiative answer (e.g., 'from London to Seattle'). If this 
10 fails, then the value of each textbox is asked separately 
(e.g., 'where do you leave from' and 'where are you going 
to') but, depending on the conditions, the mixed-initiative 
grammar may still be activated, thus allowing users to 
provide both values . 

15 

<Speech:QA 

id="QA_Panel2" 

ControlsToSpeechEnable="TextBoxl, TextBox2" 
runat=" server" > 

20 

<Question 

ClientTest="AllEmpty () " 

answers="AnsAll" 

.../> 

25 <Question 

ClientTest="TextBoxlIsEmpty ( ) " 
answers="AnsAll , AnsTextBoxl" .../> 
<Question 

ClientTest="TextBox2IsEmpty ( ) " 

3 0 answers="AnsAll , AnsTextBox2" .../> 

<Answer 

id=" AnsTextBoxl" 
onClientReco="SimpleUpdate" > 
35 <grammar src=" AnsTextBoxl . gram" /> 

< /Answer > 
<Answer 

i d= " AnsText Box2 " 
onClientReco="SimpleUpdate" > 
40 <grammar src=" AnsTextBox2 . gram" /> 
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< /Answer > 
<Answer 

id="AnsAll" 

ClientTest="IsMixedInitAllowed() " 
5 onClientReco="MixedUpdate" 

> 

<grammar src="AnsAll .gram" /> 
</ Answer > 
</Speech:QA> 

10 2.3 User initiative 

Similar to the command control, a standard QA control can 
specify a scope for the activation of its grammars. Like a 
command control , this QA control will activate the grammar 
from a relevant answer control whenever another QA control 
15 is activated within the scope of this context. Note that 
its question control will only be asked if the QA control 
itself is activated. 



<Speech:QA 
20 id="QA_Panel2" 

Cont rol sToSpeechEnabl e= " Panel 2 " 
runat=" server" > 

<Question ... /> 
25 <Answer id="AnswerPanel2" 

scope="Panel2" 

onClientReco="UpdatePanel2 ( ) " > 
<grammar src="Panel2 .gram" /> 
< /Answer > 
30 </Speech:QA> 

This is useful for dialogs which allow 'service jumping' - 
user responses about some part of the dialog which is not 
directly related to the question control at hand. 
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10 



2.4 Short time-out confirms 

Application developers can write a confirmation as usual 
but set a short time-out. In the timeout handler, code is 
provided to that accept the current value as exact . 



<Speech:QA 

id="QA_Panel2" 

Control sToSpeechEnable=" Panel2 " 
runat=" server" > 



<Confirm timeOut="20" 
onClientTimeOut=" Accept Confirmation"... /> 
<Answer id="CorrectPanel2" 

onClientReco= ,, UpdatePanel2 () " > 
15 <grammar src="Panel2 .gram" /> 

< /Answer > 
</Speech:QA> 

2.5 Dynamic prompt building and editing 

The promptFunction script is called after a question 
20 control is selected but before a prompt is chosen and 
played. This lets application developers build or modify 
the prompt at the last minute. In the example below, this 
is used to change the prompt depending on the level of 
experience of the users. 

25 

<script language= javascript > 
function GetPromptO { 

if (experiencedUser == true) 

Promptl.Text = "What service do you 

30 want?" ; 

else 

Promptl.Text = "Please choose between 
e-mail , 

calendar and news" ; 

35 return; 

} 

</script> 
<Speech:QA 

id="QA Panel2" 
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ControlsToSpeechEnable="Panel2" 
runat=" server" > 

<Question Prompt Func tion=" Get Prompt "... > 
5 < Prompt id=" Prompt 1" /> 

</Question> 
<Answer ... / > 
</Speech:QA> 

2.6 Using semantic relationships 

10 Recognition and use of semantic relationships can be done 
by studying the result of the recognizer inside the onReco 
event handler. 



<script language=" javascript" > 
15 function Reco() { 

/* 

Application developers can access the SML 
returned by the recogniser or recognition server. If a 
20 semantic relationship (like sport-news) is identified, the 
confidence of the individual elements can be increased or 
take any other appropriate action. 

*/ 

} 

25 </script> 
<Speech:QA 

id="QA_Panel2" 

ControlsToSpeechEnable="Panel2" 
runat=" server" > 

30 

<Question ... /> 

<Answer onClientReco="Reco" > 

<grammar src="Panel2 .gram" /> 
< /Answer > 
35 </Speech:QA> 
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3 Implementation and Application of RunSpeech 



A mechanism is needed to provide voice-only clients with 
the information necessary to properly render speech-enabled 
5 pages. Such a mechanism must provide the execution of 
dialog logic and maintain state of user prompting and 
grammar activation as specified by the application 
developer. 

10 Such a mechanism is not needed for multimodal clients. In 
the multimodal case, the page containing speech-enabled 
controls is visible to the user of the client device. The 
user of the client device may provide speech input into any 
visible speech-enabled control in any desired order using 

15 the a multimodal paradigm. 

The mechanism used by voice-only clients to render speech- 
enabled pages is the RunSpeech script or algorithm. The 
RunSpeech script relies upon the Speechlndex attribute of 
20 the QA control and the SpeechGroup control discussed below. 

3 . 1 SpeechControl 

During run time, the system parses a control script or 
webpage having the server controls and creates a tree 
structure of server controls. Normally the root of the tree 

25 is the Page control. If the control script uses custom or 
user control, the children tree of this custom or user 
control is expanded. Every node in the tree has an ID and 
it is easy to have name conflict in the tree when it 
expands. To deal with possible name conflict, the system 

30 includes a concept of NamingContainer . Any node in the tree 
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can implement NamingContainer and its children lives within 
that name space . 

The QA controls can appear anywhere in the server control 
5 tree. In order to easily deal with Speechlndex and manage 
client side rendering, a SpeechGroup control is provided. 
The Speechgroup control is hidden from application 
developer . 

10 One SpeechGroup control is created and logically attached 
to every NamingContainer node that contain QA controls in 
its children tree. QA and SpeechGroup controls are 
considered members of its direct NamingContainer ' s 
SpeechGroup. The top level SpeechGroup control is attached 

15 to the Page object. This membership logically constructs a 
tree - a logical speech tree - of QA controls and 
SpeechGroup controls. 

For simple speech-enabled pages or script (i.e., pages that 
20 do not contain other NamingContainers) , only the root 
SpeechGroup control is generated and placed in the page's 
server control tree before the page is sent to the voice- 
only client. The SpeechGroup control maintains information 
regarding the number and rendering order of QA controls on 
25 the page. 

For pages containing a combination of QA control (s) and 
NamingContainer (s) , multiple SpeechGroup controls are 
generated: one SpeechGroup control for the page (as 
3 0 described above) and a SpeechGroup control for each 
NamingContainer. For a page containing NamingContainers, 
the page -level SpeechGroup control maintains QA control 
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information as described above as well as number and 
rendering order of composite controls. The SpeechGroup 
control associated with each NamingContainer maintains the 
number and rendering order of QAs within each composite. 

5 

The main job of the SpeechGroup control is to maintain the 
list of QA controls and SpeechGroups on each page and/or 
the list of QA controls comprising a composite control. 
When the client side markup script (e.g. HTML) is 

10 generated, each SpeechGroup writes out a QACollection 
object on the client side. A QACollection has a list of QA 
controls and QACollections . This corresponds to the logical 
server side speech tree. The RunSpeech script will query 
the page-level QACollection object for the next QA control 

15 to invoke during voice-only dialog processing. 



The page level SpeechGroup control located on each page is 
also responsible for: 

■ Determining that the requesting client is a voice-only 
20 client; and 

■ Generating common script and supporting structures for 
all QA controls on each page. 

When the first SpeechGroup control renders, it queries the 
25 System. Web. UI . Page .Request .Browser property for the browser 
string. This property is then passed to the 
RenderSpeechHTML and RenderSpeechScript methods for each QA 
control on the page. The QA control will then render for 
the appropriate client (multimodal or voice-only) . 
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3.2 Creation of SpeechGroup controls 

During server- side page loading, the onLoad event is sent 
to each control on the page. The page-level SpeechGroup 
control is created by the first QA control receiving the 
5 onLoad event. The creation of SpeechGroup controls is done 
in the following manner: (assume a page containing 
composite controls) 

Every QA control will receive onLoad event from run time 
10 code. onLoad for a QA: 

• Get the QA' s NamingContainer Nl 

• Search for SpeechGroup in the Nl's children 

o If already exists, register QA control with this 
15 SpeechGroup. onLoad returns, 

o If not found: 

■ Create a new SpeechGroup Gl, inserts it into 
the Nl's children 

■ If Nl is not Page, find Nl's NamingContainer 
20 N2 

■ Search for SpeechGroup in N2 ' s children, if 
exists, say G2, add Gl to G2 . If not, create 
a new one G2 , inserts in to N2 ' s children 

■ Recursion until the NamingContainer is the 
25 Page (top level) 

During server- side page rendering, the Render event is sent 
to the speech-enabled page. When the page-level SpeechGroup 
control receives the Render event, it generates client side 
30 script to include RunSpeech.js and inserts it into the page 
that is eventually sent to the client device. It also calls 
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all its direct children to render speech related HTML and 
scripts. If a child is SpeechGroup, the child in turn calls 
its children again. In this manner, the server rendering 
happens along the server side logical speech tree. 

5 

When a SpeechGroup renders, it lets its children (which can 
be either QA or SpeechGroup) render speech HTML and scripts 
in the order of their Speechlndex. But a SpeechGroup is 
hidden and doesn't naturally have a Speechlndex. In fact, a 
10 SpeechGroup will have the same Speechlndex as its 
NamingContainer, the one it attaches to. The 
NamingContainer is usually a UserControl or other visible 
control, and an author can set Speechlndex to it. 

3 . 3 RunSpeech 

The purpose of RunSpeech is to permit dialog flow via logic 
which is specified in script or logic on the client. In one 
embodiment, RunSpeech is specified in an external script 
file, and loaded by a single line generated by the server- 
side rendering of the SpeechGroup control, e.g.: 

<script language= " j avascript " 
src=" /script s/RunSpeech. js" /> 

The RunSpeech. js script file should expose a means for 
25 validating on the client that the script has loaded 
correctly and has the right version id, etc. The actual 
validation script will be automatically generated by the 
page class as inline functions that are executed after the 
attempt to load the file. 

30 

Linking to an external script is functionally equivalent to 
specifying it inline, yet it is both more efficient, since 



15 



20 
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browsers are able to cache the file, and cleaner, since the 
page is not cluttered with generic functions. 

3.4 Events 

3.4.1 Event wiring 

5 Tap-and-talk multimodality can be enabled by coordinating 
the activation of grammars with the onMouseDown event. The 
wiring script to do this will be generated by the Page 
based on the relationship between controls (as specified in 
the ControlsToSpeechEnable property of the QA control in) . 

10 

For example, given an asp:TextBox and its companion QA 

control adding a grammar, the < input > and <reco> elements 

are output by each control's Render method. The wiring 

mechanism to add the grammar activation command is 

15 performed by client -side script generated by the Page, 

which changes the attribute of the primary control to add 

the activation command before any existing handler for the 

activation event: 

<! — Control output --> 
2 0 < input id="TextBoxl" type="text" .../> 

<reco id="Recol" ... /> 

< grammar src ="..." /> 
</reco> 

25 <! — Page output - -> 

<script> 

Text Box 1 . onMouseDown = 
"Recol . Start ( ) ; " +TextBoxl . onMouseDown ; 
</ script> 



By default, hook up is via onmousedown and onmouseup 
events, but both StartEvent and StopEvent can be set by web 
page author. 
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The textbox output remains independent of this modification 
and the event is processed as normal if other handlers were 
present . 

3.4.2 Page Class properties 

5 The Page also contains the following properties which are 
available to the script at runtime: 

SML - a name/value pair for the ID of the control and it's 
associated SML returned by recognition. 
10 SpokenText - a name/value pair for the ID of the control 
and it's associated recognized utterance 

Confidence - a name/value pair for the ID of the control 
and it's associated confidence returned by the recognizer. 

15 4 RunSpeech Algorithm 

The RunSpeech algorithm is used to drive dialog flow on the 
client device. This may involve system prompting and dialog 
management (typically for voice-only dialogs) , and/or 
processing of speech input (voice-only and multimodal 
20 dialogs) . It is specified as a script file referenced by 
URI from every relevant speech-enabled page (equivalent to 
inline embedded script) . 

Rendering of the page for voice only browsers is done in 
25 the following manner: 

The RunSpeech module or function works as follows 
(RunSpeech is called in response to document . onreadystate 
becoming "complete") : 
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(1) Find the first active QA control in speech index 
order (determining whether a QA control is active is 
explained below) . 

(2) If there is no active QA control, submit the 
page . 

(3) Otherwise, run the QA control. 

QA control is considered active if and only if: 

(1) The QA control's ClientTest either is not present 
or returns true, AND 

(2) The QA control contains an active question 
control or statement control (tested in source order) , 
AND 

(3) Either: 

a. The QA control contains only statement 
controls, OR 

b. At least one of the controls referenced by 
the QA control's ControlsToSpeechEnable has an 
empty or default value. 

question control is considered active if and only if: 

(1) The question control's ClientTest either is not 
present or returns true, AND 

(2) The question control contains an active prompt 
object . 

prompt object is considered active if and only if: 

(1) The prompt object's ClientTest either is not 

present or returns true, AND 
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(2) The prompt object's Count is either not present, 

or is less than or equal to the Count of the parent 
question control . 

5 A QA control is run as follows: 



(1) Determine which question control or statement 
control is active and increment its Count. 

(2) If a statement control is active, play the prompt 
10 and exit. 

(3) If a question control is active, play the prompt 
and start the Recos for each active answer control and 
command control . 



15 An answer control is considered active if and only if: 



(1) The answer control's ClientTest either is not 
present or returns true, AND 

(2) Either: 

20 a. The answer control was referenced in the 

active question contol's Answers string, OR 
b. The answer control is in Scope 



A command control is considered active if and only if: 

25 

(1) It is in Scope, AND 

(2) There is not another command control of the same 
Type lower in the scope tree. 



30 



RunSpeech relies on events to continue driving the dialog - 
as described so far it would stop after running a single QA 
control . Event handlers are included for 
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Prompt . OnComplete , Reco . OnReco , Reco . OnSilence , 

Reco . OnMaxTimeout , and Reco .OnNoReco . Each of these will 

be described in turn. 

5 RunSpeechOnComplete works as follows: 

(1) If the active Prompt object has an 
OnClient Complete function specified, it is called. 

(2) If the active Prompt object was contained within 
10 a statement control, or a question control which had 

no active answer controls, RunSpeech is called. 

RunSpeechOnReco works as follows: 

15 (1) Some default binding happens - the SML tree is 

bound to the SML attribute and the text is bound to 
the SpokenText attribute of each control in 
ControlsToSpeechEnable . 

(2) If the confidence value of the recognition result 
20 is below the Conf idenceThreshold of the active answer 

control, the Confirmation logic is run. 

(3) Otherwise, if the active answer control has on 
OnClientReco function specified, it is called, and 
then RunSpeech is called. 

25 

RunSpeechOnReco is responsible for creating and setting the 
SML, SpokenText and Confidence properties of the 
ControlsToSpeechEnable. The SML, SpokenText and Confidence 
properties are then available to scripts at runtime. 

30 

RunSpeechOnSilence , RunSpeechOnMaxTimeout , and 
RunSpeechOnNoReco all work the same way: 
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(1) The appropriate OnClientXXX function is called, 
if specified. 

(2) RunSpeech is called. 

Finally, the Confirmation logic works as follows: 

(1) If the parent QA control of the active answer 
control contains any confirm controls, the first 
active confirm control is found (the activation of a 
confirm control is determined in exactly the same way 
as the activation of a question control) . 

(2) If no active confirm control is found, RunSpeech 
is called. 

(3) Else, the QA control is run, with the selected 
confirm control as the active question control. 

For mult i -modal browsers, only the grammar loading and 
event dispatching steps are carried out. 
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APPENDIX B 

1 Design principles 

5 In this embodiment, there is no concept of primary control 
to speech-enable as it existed in Appendix A. The speech 
layer provides input to the visual layer as well as 
explicit support for dialog flow management. The semantic 
layer implements the logic needed for confirmation and 

10 validation. In a multimodal interaction, the semantic layer 
does not need to be used as confirmation and validation 
are visual and implemented using standard ASP.NET 
constructs. If desired though, the sematic layer can be 
updated with value changes made through visual or GUI 

15 interfaces in order that confirmation and validation can be 
still implemented. 

FIG. 12 illustrates the speech controls inheritance 
diagram. 

20 2 Authoring scenarios 

The following provides examples of various forms of 
application scenarios. 

2.1 Multimodal app, tap-and- talk 

25 < speech :QA id="qal ,! runat=" server "> 
< Answer s> 

< speech : Answer Semantic I tem= " siText " ID= " answer 1 " 
Xpat hTr igger = 11 / sml / value " runa t = " server" > 

</speech: Answer> 
30 < /Answers > 

<Reco StartEvent= " textboxl . onmousedown" 
StopEvent = n textboxl . onmouseup" ID= ,, recol" 
Mode=" Single" > 
< Grammars > 
35 <speech : Grammar 
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Src= "http : //mysite/mygrammar . grxml " 
ID= "Grammarl " runat=" server " > 
< / speech : Grammar > 
< /Grammars > 
5 </Reco> 
</ speech :QA> 

2.2 Multimodal app, click-and-wait- for -recognition 

<speech:QA id="qal" runat=" server" > 

<Reco id= " recol " StartEvent = " textboxl . onmousedown" 
10 mode= n automatic" > 

<Grammars> 

< speech : grammar 
src="htp : //mysite/mygrammar .grxml " 

rnat= " server" ></ speech : grammar > 
15 < /Grammars > 

</Reco> 
<Answers> 

<speech : answer id= "answerl " 
XpathTrigger=" /sml /value" Semanticltem=" siText " 
20 runa t = " server " > 

</speech: answer> 
< /Answers > 
</speech:QA> 

25 2.3 Multimodal app, do- field 

<speech:QA id="qal" runat= " server" > 

<Reco id="recol " 
StartEvent= " dof ieldButton . onmousedown" 
30 St opE ven t = " do f i e 1 dBu 1 1 on . onmou s eup " 

mode= "multiple " > 

<Grammars> 

< speech : grammar 

src= " ht tp : / /mysite /mylargegrammar . xml " runat = " server " > 
35 < / speech : grammar > 

< /Grammars > 
</Reco> 
<Answers> 

<speech : answer id= " answerl " 
40 XpathTrigger= " /sml/valuel " Semant icltem= " siOne " 
runat = " server" > 

</speech : answer> 
<speech : answer id= " answer2 " 
XpathTrigger="/sml/value2 " Semanticltem=" siTwo" 
45 runat = "server" > 
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</speech : answer> 
speech : answer id= " answer3 " 
XpathTrigger= 11 /sml/value3 " Semant id tem=" siThree" 
runa t = " server " > 
5 </speech : answer> 

<speech : answer id= " answer4 " 
XpathTrigger=" /sml/value4 " Semanticltem=" siFour " 
runat=" server" > 

</speech : answer> 
10 <speech: answer id="answer5 n 

XpathTrigger= " / sml /value 5 " Semant i c 1 1 em= " s i Fi ve " 
runat= n server" > 

< / speech : answer > 
</Answers> 
15 </speech:QA> 

2.4 Voice only app, statement 

<speech:QA id= "welcome" PlayOnce=" true" runat=" server" > 

<Prompt InLineprompt= "Hello there 1 "></Prompt> 
</speech:QA> 

20 2.5 Voice only app, simple question 

<speech:QA id="qal M runat= " server " > 

<Reco id= " recol " mode= " automat ic " > 
<Grammars> 
25 < speech : grammar 

src="http : //mysite/citygrammar .grxml " 
runat= " server" ></speech : grammar> 
< /Grammars > 
</Reco> 

30 <Prompt InLinePrompt= "Which city do you want to fly 

to?" ></Prompt> 
<Answers> 

<speech : answer id="answerl " 
XpathTrigger="/ sml /city" Semant icltem=" siCity" 
35 runat=" server" > 

</speech : answer> 
</Answers> 
</speech:QA> 

2.6 Voice only app, question with mixed-initiative 
40 (optional answers) 



<speech:QA id="qal" runat=" server" > 

<Reco id=" recol" mode= " automat ic " > 
<Grammars> 
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< speech : grammar 
src= "http : / /mysite/cityANDstate . xml " 
runat = " server " >< / speech : grammar > 
< /Grammars > 
5 </Reco> 

<Prompt InLinePrompt= "Which city do you want to fly 
to?" ></Prompt> 
<Answers> 

< speech: answer id= "answer 1 " 
10 XpathTrigger= M /sml/city" Semanticltem = "siCity" 
runat =" server" > 

</ speech : answer> 
< /Answers > 
<ExtraAnswers> 
15 < speech: answer id="answer2" 

XpathTrigger= M /sml/state" Semanticltem = "siState" 
runat = " server " > 

</speech: answer> 
</ExtraAnswers> 
20 </speech:QA> 

2.7 Voice only app, explicit confirmation 

<speech:QA id="qal" runat =" server " > 

<Reco id= " recol " mode= " automat ic " > 
< Grammars > 

< speech : grammar 
src= " http : / /mysite/citygrammar . xml » runat = " server" > 
</ speech : grammar > 
< /Grammars > 
</Reco> 

<Prompt InLinePrompt= "Which city do you want to fly 
to? ">< /Prompt > 
<Answers> 

<speech : answer id= " answer 1 " 
XpathTrigger=" /sml/city" SemanticItem="siCity" 
conf irmThreshold="0 . 75" runat =" server " > 

< / speech : answer > 
< /Answers > 
</speech :QA> 

40 <speech:QA id="qa2" runat =" server" 
xpathAcceptConf irms= " /sml/accept " 
xpat hDenyConf i rms= " / sml / deny " > 

< Prompt InLinePrompt="Did you say 
<SALT : value >textboxl . value < /SALT : value > " >< /Prompt > 
45 <Reco id=" recol" mode =" automat ic " > 

<Grammars> 



25 



30 



35 
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< speech : grammar 
src="http : //mysite/yes_no__city .xml " 
runat =" server " ></ speech : grammar > 
< /Grammars > 
5 </Reco> 

<Conf irms> 

<speech: answer id= n answer2 " 
XpathTrigger="/sml/city" SemanticItem="siCity" 
conf irmThreshold= "0.75" runat = " server" > 

10 </speech : answer> 

</Conf irms> 
</ speech :QA> 

2.8 Voice only app, short time-out confirmation 

<speech:QA id="qal" runat = " server" 
15 xpathAcceptConf irms= " / sml/accept " 
xpat hDenyConf irms= " / sml / deny " 

f irstInitialTimeout="50 0 " > 
<Prompt InLinePrompt="Did you say 
<SALT rvalue >textboxl . value</SALT : value>" ></ Prompt > 
20 <Reco id="recol" InitialTimeout= " 350 " 

mode= " automat ic " > 

< Grammars > 

< speech : grammar 
src="http : //mysite/yes_no_city .grxml " 
25 runat = " server" ></ speech : grammar > 
< /Grammars > 
</Reco> 
<Conf irms> 

<speech : answer XpathTrigger= " /sml/city " 
30 SemanticItem="siCity" conf irmThreshold="0 . 75" 
runat = " server " > 
< / speech : answer > 
</Confirms> 
</ speech :QA> 

35 2.9 Voice only app, commands 

<speech:QA id="qal" runat= " server" > 

<Prompt id= "prompt 1" InLinePrompt="Where do you want 
to fly to? " ></Prompt> 

<Reco id="recol" mode=" automatic" > 
40 <Grammars> 

<speech : grammar 
src="http : //mysite/city .grxml " 

runat= " server" ></speech : grammar> 
< /Grammars > 
45 </Reco> 
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< Answer s> 

< speech : answer id= " answerl " 
XpathTrigger= 11 /sml/city " Semant icltem= " siCity" 
runat= " server" ></speech : answer> 
5 </Answers> 
</speech:QA> 

<speech: Command id= M commandl" type=" cancel" scope="qal" 
OnClientCommand= "myCommand" 
10 runat = " server " >< / speech : Command> 

<script> 

function myCommand () 
{ CallControl . Hangup ( ) ; } 
15 </script> 

2.10 Voice only app, prompt selection 

< speech :QA id="qal" runat =" server" > 

< Prompt id= 11 prompt 1" InLinePrompt = "Where do you want 
20 to fly to?"x/Prompt> 

<Reco id="recol" mode= " automatic " > 
<Grammars> 

< speech : grammar 
src=" http : //mysite/city .grxml " 
25 runat = " server" ></speech : grammar> 
</ Grammars > 
</Reco> 
<Answers> 

< speech : answer id= " answerl " 
30 XpathTrigger=" /sml/city" Semant id tem= " siCity" 

runat= " server" ></speech : answer> 

</Answers> 
</speech:QA> 

35 <speech: Command id="commandl" type= " cancel " scope="qal" 
OnC 1 i ent Command= 11 myCommand " 

runat = " server " >< / speech : Command> 

<script> 
40 function myCommand () 

{ CallControl . Hangup ( ) ; } 
</script> 

< speech : qa id= "qal " runat = " server" > 
45 < Prompt id=" prompt 1" 

PromptSelectFunction="promptSelection" / > 
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<Reco id= " recol " mode= 11 automatic 11 > 
< Grammar s> 

< speech: grammar src="http : //mysite/city .xml 11 
runat=" server" ></ speech : grammar > 
5 < /Grammars > 

</Reco> 
<Answers> 

<speech : answer id="answerl " 
XpathTrigger= " /sml/city " Semant id tem= " siCity " 
10 runat = " server" ></speech : answer> 

< /Answers > 
</speech:qa> 

<script> 

15 function promptSelection (lastCommandOrException, count, 
answerArray) 

{ 

if (lastCommandOrException — "Silence") 

{ 

20 return "Sorry, I couldn't hear you. Please speak 

louder. Where do you want to fly to?"; 

} 

else if (count >3) 

{ 

25 return "Communication problems are preventing me 

from hearing the arrival city. Please try again later."; 

} 

return "Where do you want to fly to?"; //Default 

prompt 
30 } 

} 

</ script> 

2.11 Voice only app, implicit confirmation 

< speech : qa id= n qal " runat = " server " 
35 xpathDenyConf irms="/sml/deny" 

xpathAcceptConf irms= " / sml/accept " > 

< Prompt id= "prompt 1 " 
PromptSelectFunction= "promptSelection" ></Prompt> 
<Reco id=" recol" mode= "automatic "> 
40 < Grammars > 

<speech : grammar 
src = "http : //mysite/yes_no__city . xml " 
runat=" server" ></ speech :grammar> 
< /Grammars > 
45 </Reco> 

<Answers> 
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<speech: answer id= " answer 1" 
Xpat hTr igger = ,! / sml /date " Semi t em= 11 siDat e " 

runat=" server" ></speech : answer> 

< /Answers > 
5 <Confirms> 

<speech : answer id= " conf irml " 
XpathTrigger= " /sml/city ,! Semltem= " siCity " 

runat=" server" ></speech : answer> 
</Conf irms> 
10 </speech:qa> 

<script> 

function promptSelection (lastCommandOrException, count, 
SemanticItemList) 
15 { 

var myPrompt = "" ; 
if (SemanticItemList ["siCity" ] .value != null) 

{ 

20 myPrompt = "Flying from w + 

SemanticItemList ["siCity"] .value + w . " ; 
myPrompt + = "On what date?"; 

} 

else 

25 { myPrompt = "On what date?" ; 

} 

return myPrompt; 

} 

</script> 

30 2.12 Voice only app, QA with reco and dtmf 

< speech : qa id= " qal " runat = " server " > 

<Prompt id= "prompt 1" InLinePrompt=" Press or say one if 
you accept the charges, two if you 

35 don' t . " ></ Prompt > 

<Reco id="recol" mode= "automatic" > 
< Grammar s> 

<speech : grammar 
src= "http : //mysite/accept Charges .xml " 
40 runat = " server" ></speech : grammar> 

< /Grammars > 
</Reco> 

<Dtmf smlContext= " sml/accept " ></Dtmf > 
<Answers> 

45 < speech: answer id= " answer 1 " 

XpathTrigger= " / sml/accept 11 
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Semantic I tem= " siAccept " 
runat=" server" ></speech : answer> 

</Answers> 
</speech : qa> 

5 2.13 Voice-only app, Record-only QA 

<speech : qa id= "qal " runat=" server" > 
<Answers> 

<speech: answer id="al" 
10 XpathTrigger="/SML/@recordlocation" 
Semanticltem = "foo" 
runat=" server" ></speech: answer> 
< /Answers > 

<Reco id= " recordonly " > 
15 <record beep=" true" ></record> 

</Reco> 
< / speech : qa >< / FORM> 

3 Design details 

20 3.1 QA activation (voice-only) 

QA are tested for activeness in Speechlndex order (see run- 
time behavior) . 

A QA is active when clientActivationFunction returns true 
AND 

25 If the Answers array is non empty, the Semanticltems 
pointed to by the set of Answers are empty OR 
If the answers array is empty, at least one item in the 
Confirm array does need confirmation 

30 A QA can have only Answers (normal question: Where do you 
want to go?) , only Confirms (explicit confirmation: Did you 
say Boston? or short time-out confirmation: Boston.), both 
(implicit confirmation: When do you want to fly to Boston?) 
or none (statement: Welcome to my application!). 

35 A QA can have extra answers even if it has no answers 
(e.g., mixed initiative). 
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3.2 Answer, Confirm. 

Upon recognition, commands are processed first, followed by 
Answers, ExtraAnswers and Confirms. 

5 A target element (e.g. textboxl .value) can be in one of 
these states: empty, invalid, needsConf irmation, confirmed. 
A target is empty before any recognition result is 
associated with this item, or if the item has been cleared. 
A target is in needsConf irmation state when a recognition 
10 result has been associated with it, but the confidence 
level is below the conf irmationThreshold for this item. And 
a target is confirmed when either a recognition result has 
been associated with it with a confidence level high enough 
or a confirmation loop set it to this state explicitly. 

15 

Answers are therefore responsible for setting the value in 
the target element and the confidence level (this is done 
in a semantic layer) . Confirms are responsible for 
confirming the item, clearing it or setting it to a new 
20 value (with a new confidence level) . 

3.3 Command execution (and scope) 

Commands specify a scope and are active for all QA' s within 
that scope. The default processing of a command is to set 
the current QA's lastCommandException to the command's 
25 type. If the command specifies a Grammar, this grammar is 
activated in parallel with any grammars in the current Reco 
object. QAs can be modal (allowCommands=f alse) , in which 
case, no commands will be processed for that particular QA. 

3.4 Validators 

30 

A CompareValidator will be active when the value of the 
SemanticItemToValidate it refers to has not been validated 
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by this validator. If SemanticItemToCompare is specified 
(rather than ValueToCompare) , then the CompareValidator 
will only be active if the value of the 
SemanticItemToCompare is non-empty (i.e. if it has been 
5 assigned a value by a previous QA) . 

A CustomValidator will be active when the value of the 
SemanticItemToValidate it refers to has not been validated 
by this validator. 

10 4 Run time behavior 



4.1 Client detection 

The speech controls do pay attention to the variety of 
client that they are rendering for. If the client doesn't 

15 support SALT, the controls won't render any speech-related 
tags or script. Client detection is done by checking the 
browser capabilities and detecting whether it's a voice- 
only client (browser is Quadrant) , or multimodal (IE, 
Pocket IE, etc, with SALT support) . 

20 Hands -free is not a mode in the client, but rather an 
application-specific modality, and therefore the only 
support required is SALT (as in multimodal) . Hands-free 
operation is therefore switched-on by application logic. 

4.2 Multimodal 

25 Support for multimodal applications is built in the speech 
controls. In multimodal operations commands, dtmf, confirm, 
prompts, etc do not make sense from an interaction point of 
view, so they won't be rendered. Tap-and-talk (or any other 
type of interaction, like click-and-wait-f or-recognition) 

30 is enabled by hooking up the calls to start and stop 
recognition with GUI events using the Reco object 
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attributes start Element/ start Event and 

stopElement/stopEvent , plus the Reco object mode attribute. 

During render time, the speech controls are passed 
5 information specifying whether the client is a voice-only 
client or multmodal client. If the client is multimodal, 
the rendering process hooks the call to start recognition 
to the GUI event specified by the StartEvent attribute of 
the Reco object. The rendering process also hooks the call 
10 to stop recognition to the GUI event specified by the 
StopEvent attribute of the Reco object. 

The multimodal client needs a mechanism which will invoke 
author-specified functions to handle speech-related events 
15 (e.g., timeouts) or recognition processing. This mechanism 
is the Multimodal . j s script. Multimodal . j s is specified in 
an external script file and loaded by a single line 
generated by server-side rendering, e.g., 

20 <script language^' " javascript" 

src=" /scripts/Multimodal . j s" /> 

This method mirrors the ASP.NET way of generating 'system' 
client-side script loaded via URI . Linking to an external 
25 script is functionally equivalent to specifying it inline, 
yet is more efficient since clients are able to cache the 
file, and cleaner, since the page is not clutered with 
generic functions . 

4 . 3 Voice -only 

30 4.3.1 -Runtime script (RunSpeech) 

Unlike in a multimodal interaction, where the user 

initiates all speech input by clicking/selecting visual 

elements in the GUI, a mechanism is needed to provide 
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voice-only clients with the information necessary to 
properly render speech-enabled ASP.NET pages. Such a 
mechanism must guarantee the execution of dialog logic and 
maintain state of user prompting and grammar activation as 
5 specified by the author. 

The mechanism used by the Speech Controls is a client-side 
script (RunSpeech. j s) that relies upon the Speechlndex 

attribute of the QA control, plus the flow control 

10 mechanisms built in the framework 

(ClientActivationFunction, default activation rules, etc.). 

RunSpeech is loaded via URI similar to the loading 
mechanism of Mult imodal . j s as described above. 

4.3.2 Speechlndex 

15 Speechlndex is an absolute ordering index within a naming 

container. 

If more than one speech control has the same Speechlndex, 
they are activated in source order. In situations where 
20 some controls have Speechlndex specified and some controls 
do not, those with Speechlndex will be activated first, 
then the rest in source order. 

NOTE: Speech index is automatically set to 0 for new 
25 controls. Dialog designers should leave room in their 
numbering scheme to insert new QA' s later. Begin with a 
midrange integer and increment by 100, for example. For 
example number QA's 1000, 1100, 1200 instead of 1, 2, 3. 
this leaves room for a large number of QA's at any point 
30 the dialog and plenty of room to add QA's at the beginning. 

4.3.3 ClientActivationFunction 

ClientActivationFunction specifies a client-side script 
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function which returns a boolean value to determine when 
this control is considered available for selection by the 
run-time control selection algorithm. If not specified, it 
defaults to true (control is active) . 

5 

The system strategy can therefore be changed by using this 
as a condition to activate or de-activate QAs more 
sensitively than Speechlndex. If not specified, the QA is 
considered available for activation. 

10 4.3.4 Count 

Count is a property of the QA control that indicates how 

many times that control has been activated consecutively. 
This Count property will be reset if the previously active 
QA is different that the current QA (same applies for 
15 Validators), otherwise, it is incremented by one. The Count 
property is exposed to application developers through the 
PromptSectionFunction of the Prompt object. 
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Controls reference 
General Authoring Notes 

5 1. Script references are not validated at render time. 

The Speech Controls and objects described in this section 
contain attributes whose values are references to script 
functions written by the dialog author. These functions are 
executed on client devices in response to speech-related 

10 events (e.g. expiration of timeout) or as run time 
processing (e.g. modification of prompt text prior to 
playback) . Render time validation is not performed on 
script references, i.e., no checks for existence of script 
functions is done during rendering of controls. If an 

15 attribute contains a reference to a client-side script 
function and the function does not exist, client-side 
exceptions will be thrown. 

In voice-only mode, script functions generating exceptions 
20 during runtime will cause a redirection to the error page 
defined in the Web.config file. If no error page is 
defined, RunSpeech will continue to execute without 
reporting the exception. 

25 2. All Speech Controls should be contained within ASP.NET 
<form> tag or equivalant. 

The Speech Control described in this section must all be 
placed in ASP.NET web pages inside the <form> tag. Behavior 
of controls placed outside the <form> tag is undefined. 

30 

3 .Client-side script references must refer to function and 
not include parenthes. 

Using the PromptSelectFunction as an example, the following 
35 is correct syntax: 
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<Prompt id="Pl" PromptSelectFunction="mySelectFunction" /> 
//using "mySelectFunction ( ) w is incorrect syntax 

5 

4. IE requires exact cases when running Jscript. 

Therefore, the case for event values specified in the 

StartEvent and StopEvent attributes of the Prompt object 
must be exactly as those events are defined. This happens 
10 to be all lowercase letters for most standard IE events. 
For example, the onmouseup and onmousedown events must be 
specified in all lowercase letters. 

5. All Speech Controls expose the common attribute id. 

15 

6. Behavior of visible and enabled properties of Speech 
Controls . 

Setting the visible or enabled properies of Speech Controls 
to "False" will cause them not to render. 

20 

7. Mimimum client requirements 

In one embodiment, clients must be running IE6 . 0 or greater 
and JScript 5.5 or greater for speech controls and 
associated script functions to work properly. 

25 

8. Rendering <smex> to telserver 

The speech controls automatically handle rendering <smex> 
tags to the telephony server on every page as is required 
by the server. In one embodiiment, smex tags are rendered 
30 whether the client is the tel server or the desktop client. 

5 Global Application Settings 

Speech Controls provide mechanisms that allow dialog 
authors to specify values to control properties on an 
application or page basis. 
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5.1 Application- level settings 

5.1.1 Application global variables 

Dialog authors may use their application's Web.config file 
to set values of global variables for speech-enabled web 
5 applications. The values of the global variables persist 
throughout the entrie lifetime of the web application. 
'Errorpage' is the only global variable that may be 
specified and is set for the application during render 
time . 

10 

<appSettings> 

<add key= M errorpage" value= ,! ..." /> 
</appSettings> 

15 The <appSettings> tag must be placed one level inside the 
<conf iguration> tag within the Web.config file. 

The errorpage key specifies a URI to a default error page. 
Redirection to this error page will occur during run time 
20 when the speech platform or the DTMF engine returns an 
error. A default error page is included with the SDK; the 
user can also create a custom error page. 

Note: Developers who create their own error page must call 
25 window, close at the bottom of the error page in the voice 
only case in order to release the call. 

5.1.2 Application- level setting of common control 
properties 

Dialog authors may use their application's Web.config file 
30 to set values of common control properties and have those 
values persist during the lifetime of the web application. 
For example, an author may wish use the Web.config file to 
set the maxTimeout value for Reco objects in their 
application. The properties are set in the Web.config file 
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using the following syntax: 

<conf iguration> 

<SpeechStyleSheet> 
5 <Style id="stylel" > 

<QA allowCommands=" false" > 

<Prompt bargein=" false" ... /> 
<Reco maxTimeout = "5000"... /> 
<Dtmf preFlush="true" ... /> 
<Answers conf irmThreshold=" 0 . 80" ... /> 
<ExtraAnswers conf irmThreshold=" 0 . 80" 

<Confirms conf irmThreshold=" 0 . 80"... /> 
</QA> 

<Command .../> 
<CustomValidator .../> 
<CompareValidator .../> 
20 <SemanticItem .../> 

</Style> 
</SpeechStyleSheet> 
</conf iguration> 

25 The Reco corresponding Reco object would reference the 
"stylel" Style: 

<Reco id="recol" ... StyleRef erence=" stylel" ... /> 

30 

If the Style id is "globalStyle, " the property values set 
in the Style apply application-wide to pertinent controls. 
So, in the above example, if id="" (or the property is 
omitted from the Style tag) , a maxTimeout of 5000 
35 milliseconds will be used for all Reco objects in the 
application (uless overridden) . 

For a complete list of properties which are settable 
through the SpeechStyleSheet , see below. 

40 
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6 Stylesheet Control 

The Stylesheet control allows dialog authors to set values 
to common control properties at a page- level scope. The 
Stylesheet control is a collection of Style objects. The 
5 Style object exposes properties of each control that are 
settable on a page- level basis. The Stylesheet control is 
rendered for both multimodal and voice-only modes. An 
exception will be thrown if the Stylesheet control contains 
an object which is not a Style object. 



6.1 Stylesheet properties 
Styles 

Optional. Used in both multimodal and voice-only modes. The 
20 Styles property is a collection of Style objects used to 
set property values for Speech Controls and their objects. 
The property values last during the lifetime of the current 
page. 

25 7 Style Object 

The Style object is used to set property values for Speech 
Controls and their objects. The property values last during 
the lifetime of the current page. 

30 class Style : Control 
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SpeechControl 



id{get ; set ; } ; 

Styles {get ; } ; 



15 } 



35 



string 
string 
QAStyle 



Commands tyle 
CustomValidatorStyle 



id {get ; set; } ; 
StyleRef erence{get ; set; } ; 
QA{get; set;}; 
Command {get ; set ; } ; 



CustomValidator {get ; set; } ; 
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CompareValidatorStyle CompareValidator {get ; set ; } ; 
SemanticItemStyle Semanticltem{get ; set ; } ; 

} 

5 7.1 Style properties 
id 

Required. The programmatic name of the Style object. 

10 StyleRef erence 

Optional. Used in both multimodal and voice-only modes. 

Specifies the name of a Style object. At render time, the 
Stylesheet control will search for the named Style object 
and also set property values specified in the named Style. 
15 An exception is thrown for an invalid StyleRef erence . 

For every property of a speech control with a 

StyleRef erence, the value is determined as follows: 

1. the value is set directly in the speech control 
20 2. the style object directly referenced 

3 . any style referenced by a style 

4. the global style object 

5. the speech control default value. 

25 

The following example sets shows two QA properties are set 
using StyleRef erence : 

<speech: Stylesheet id="SS"> 

30 

<speech: Style id="base_style" > 

<QA OnClientActive="myOnClientActive"/ > 
</ speech : Style> 

35 <speech: Style id="derived_style" 

StyleRef erence="base_style" > 

<QA PlayOnce="true"/> 
</ speech : Style > 

40 </speech : Stylesheet > 
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QA 

Optional. The QA property of the Style object is used to 
set property values for all QA controls on a page that 
reference this Style. The following example shows how to 
5 set the AllowCommands and PlayOnce properties for the QA 
controls that reference this Style: ' 

<speech: Stylesheet id="SSl"> 

10 <speech: Style id="WelcomePageQA__Style" > 

<QA AllowCommands=" false" PlayOnce=" true" /> 
</speech : Style> 

</speech : Stylesheet > 

15 

<QA id="..." StyleRef erence="WelcomePageQA_Style" .../> 

The next example shows how to set the bargein property of 
all Prompt objects on a given page using Params : 

20 

<speech: Stylesheet id="SS2"> 

<Style Name= n Stylel"> 
<QA> 

25 <Answers Conf irmThreshold= "0.8" Re j ect=" 0 . 4 " /> 

< Prompt > 

< Params > 

<Param name= M BargeinType M value = "grammar " /> 
<Param name="foo" value="bar" /> 
30 < Params > 

</ Prompt > 
</QA> 
</Style> 

35 

</speech : Stylesheet > 
Command 

Optional. The Command property of the Style object is used 
40 to set property values for all Command controls on a page 
that reference this Style. 
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CustomValidator 

Optional. The CustomValidator property of the Style object 
is used to set property values for all CustomValidator 
controls on a page that reference this Style. 

5 

CompareValidator 

Optional. The CompareValidator property of the Style object 
is used to set property values for all CompareValidator 
controls on a page that reference this Style. 

10 

S eman t i c I t em 

Optional. The Semanticltem property of the Style object is 
used to set property values for all Semanticltem controls 
on a page the reference this Style. 

15 

The following properties may be set using the Style object. 

QA Properties 

AllowCommands 
20 PlayOnce 

XpathAccept Confirms 

XpathDenyConf irms 

AcceptRe j ectThreshold 

DenyRe j ectThreshold 
25 First InitialTimeout 

Con f i r mByOmi s s i on 

Conf irmlf Equal 

OnC 1 i en t Ac t i ve 

OnClientListening 
30 OnClientComplete . 

Prompt Properties 

These apply to Prompts in QA, CompareValidator, 
CustomValidator and Command controls. 

35 

Barge in 

OnCl ientBookmark 
OnClientError 
Prefetch 
40 Type 
Lang 
Params 
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Reco Properties 

StartEvent 

StopEvent 
5 Mode 

InitialTimeout 

BabbleTimeout 

MaxTimeout 

EndSilence 
10 Reject 

OnClientSpeechDetected 

OnClient Silence 

OnClientNoReco 

OnClientError 
15 Lang 

Params 

Grammar Properties 

These apply to both Reco and Dtmf grammars. 

20 

Type 
Lang 

Dtmf Properties 

25 InitialTimeout 

InterDigit Timeout 

OnC 1 i ent S i 1 enc e 

OnClientKeyPress 

OnCl ientError 
30 Params 

Answer Properties 

These apply to the Answers, ExtraAnswers and Confirms 
collections . 

35 

Conf irmThreshold 
Reject 

Command Properties 

40 Scope 

AcceptCommandThreshold 

CompareValidator Properties 

ValidationEvent 
45 Operator 
Type 
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5 



35 



InvalidateBoth 

CustomValidator Properties 

ValidationEvent 

Semanticltem Properties 

BindOnChange 



8 QA control 



10 The QA control is responsible for querying the user with a 
prompt, starting a corresponding recognition object and 
processing recognition results. 

The QA control is rendered for both multimodal and voice- 
15 only modes . 

class QA : IndexedStyleRef erenceSpeechControl 

{ 

string id{get; set;}; 

20 int Speechlndex{get ; set ; } ; 

string Client Act ivationFunct ion{get ; set; }; 

string OnClientActive{get ; set ; } ; 

string OnClientComplete{get ; set; } ; 

string OnClientListening{get ; set; } ; 

25 bool AllowCommands {get ; set ; } ; 

bool PlayOnce{get ; set;}; 

string XpathAcceptConf irms {get ; set; } ; 

string XpathDenyConf irms {get ; set; } ; 

float AcceptRejectThreshold{get ; set; }; 

30 float DenyRej ectThreshold{get ; set;}; 

float FirstlnitialTimeout {get ; set ; } ; 

string StyleRef erence{get ; set; } ; 

bool Conf irmByOmission{get ; set; } ; 

bool Conf irmlf Equal {get ; set;}; 



AnswerCollection Answers {get; } ; 
AnswerCollection ExtraAnswers{get ; } ; 
AnswerCollection Conf irms {get ; } ; 



Prompt Prompt {get 

40 Reco Reco{get;} 

Dtmf Dtmf{get;} 



}; 
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8.1 QA Properties 

All properties of the QA control are available to the 
application developer at design time. 

5 

Speechlndex 

Optional. Default is Zero, which is equivalent to no 
Speechlndex. Only used in voice-only mode. Specifies the 
activation order of speech controls on a page and the 

10 activation order of composite controls. All controls with 
Speechlndex > 0 will be run and then controls with 
Speechlndex = 0 will be run in source order. If more than 
one control has the same Speechlndex, they are activated in 
source order. In situations where some controls specify 

15 Speechlndex and some controls do not, those with 
Speechlndex specified will be activated first, then the 
rest in source order. Speechlndex values start at 1. An 
exception will be thrown for non-valid values of 
Speechlndex. 

20 

ClientActivationFunction 

Optional. Only used in voice-only mode. Specifies a client- 
side script function which returns a Boolean value to 
determine when a QA control is considered available for 
25 selection by the run-time control selection algorithm. If 
not specified, it defaults to true (control is active) . The 
signature for ClientActivationFunction is as follows: 

bool ClientActivationFunction (object lastActiveObj , 
30 string lastCommandOrException, int count) 

where : 

lastActiveObj is the last active control, e.g. QA, 
CustomValidator or CompareValidator . For the first 
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activated QA on a page, lastActiveObj will be null. 

lastCommandOrException is a Command type (e.g., "Help") or 
a Reco event (e.g., "Silence" or "NoReco") of the last 
5 active control. For the first activated QA on a page or if 
the last active control is a validator, 
lastCommandOrException will be an empty string. 

count number of times the last active QA has been activated 
10 consecutively, 1 if this is the first acvtive QA on the 
page. Count starts at 1 and has no limit. However, for the 
first activated QA on a page, count will be set to zero. 

OnCl ientAc t ive 

15 Optional. Used in both multimodal and voice-only modes. 

Specifies a client-side script that will be called after 
RunSpeech determines this QA is active (voice-only mode) or 
after the startEvent is fired (in multimodal) and before 
processing the QA (e.g., playing a prompt or starting 

20 recognition) . The onClientActive function does not return 
values. The signature for onClientActive is as follows: 

function onClientActive (string eventsource, string 
lastCommandOrException, int Count, object SemanticItemList) 

25 

where : 



eventsource is the id of 
Reco . StartEvent) whose event 
30 with the QA (for multimodal) . 
voice-only mode. 



the object (specified by 
started the Reco associated 
eventsource will be null in 



lastCommandOrException is a Command type (e.g., "Help") or 
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a Reco event (e.g., "Silence" or "NoReco") for voice-only 
mode. lastCommandOrException is the empty string for 
multimodal . 

Count is the number of times the QA has been activated 
5 consecutively. Count starts at 1 and has no limit for 
voice-only mode. Count is zero for multimodal. 
SemanticItemList For voice-only mode, SemanticItemList is 
an associative array that maps semantic item id to semantic 
item objects. For multimodal, SemanticItemList will be 
10 null. 

OnCl ient Complete 

Optional. Used in both multimodal and voice-only modes. 
Specifies a client-side script that will be called after 

15 execution of a QA (successfully or not) and before passing 
dialog control back to the RunSpeech algorithm (in voice- 
only) or the end user (in multimodal) . The OnClientComplete 
function is called before postbacks to the server for QAs 
whose AutoPostBack attribute of the Answer object is set to 

20 true. The OnClientComplete function does not return values. 
The signature for OnClientComplete is as follows: 

function OnClientComplete (string eventsource, string 
lastCommandOrException, int Count, object SemanticItemList) 

25 

where : 

eventsource is the id of the object (specified by 
Reco.StopEvent) whose event stopped the Reco associated 
30 with the QA (for multimodal) . eventsource will be null in 
voice-only mode. . 
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last CommandOrExcept ion is a Command type (e.g., "Help") or 
a Reco event (e.g., "Silence" or "NoReco") for voice-only 
mode. lastCommandOrException is the empty string for 
multimodal . 

5 Count is the number of times the QA has been activated 
consecutively. Count starts at 1 and has no limit for 
voice-only mode. Count is zero for multimodal. 
SemanticItemList For voice-only mode, SemanticItemList is 
an associative array that maps semantic item id to semantic 
10 item objects. For multimodal, SemanticItemList will be 
null . 

OnCl i en tL i s t ening 

Optional. Used in both multimodal and voice-only modes. 
15 Specifies a client-side script (function) that will be 
called/executed after successful start of the reco object. 
The main use is so the GUI can change to show the user that 
they can start speaking. The function does not return any 
values. The signature for OnClientListening is as follows: 

20 

function OnClientListening (string eventsource, string 
lastCommandOrException, int Count, object SemanticItemList) 

25 where : 

eventsource is the id of the object (specified by 
Reco . StartEvent) whose event started the Reco associated 
with the QA (for multimodal) . eventsource will be null in 
30 voice-only mode. 

lastCommandOrException is a Command type (e.g., "Help") or 
a Reco event (e.g., "Silence" or "NoReco") for voice-only 
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mode. last CommandOrExcept ion is the empty string for 
multimodal . 

Count is the number of times the QA has been activated 
consecutively. Count starts at 1 and has no limit for 
5 voice-only mode. Count is zero for multimodal. 

SemanticItemList For voice-only mode, SemanticItemList is 
an associative array that maps semantic item id to semantic 
item objects. For multimodal, SemanticItemList will be 
null . 

10 

Note: In multimodal mode OnClientListening is only 
available if author chooses to use StartEvent . If author 
decides to start reco programmatically , then 
OnClientListening is not called for the author because the 
15 author can detect when reco. start returns successfully. 

Note: OnClientListening is ignored when specified in QA's 
that do not contain reco objects. 

20 AllowCommands 

Optional. Only used in voice-only mode. Indicates whether 
or not Commands may be activated for a QA control. When 
AllowCommands is set to false, no commands may be 
activated. Defaults to true. 

25 

PI ay Once 

Optional. Only used in voice-only mode. Specifies whether 
or not a QA may be activated more than once per page. If 
not specified, PlayOnce is set to false. PlayOnce=" true" 
30 may be used to author statements like welcoming prompts. 
When a QA is reduced to a statement (no reco) , setting 
PlayOnce=" false" will provide dialog authors with the 
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capability to enable a "repeat" functionality on a page 
that reads email messages. 

XpathAcceptConf inns 

5 Optional. Only used in voice-only mode. Specifies the path 
in the sml document (recognition result) that indicates the 
confirm items were accepted. Required if Confirms are 
specified. If XpathAcceptConf irms is specified without a 
Confirm being specified it is ignored. XpathAcceptConf irms 
10 must be a valid xml path. An invalid xml path will cause a 
redirection to the default error page during run time. 

XpathDenyConf irms 

Optional. Used only in voice-only mode. Specifies the path 
15 in the sml document that indicates the confirm items were 
denied. Required if Confirms are specified. If a Confirm is 
specified and XpathDenyConf irms is not set an exception is 
thrown. If XpathDenyConf irms is specified without a Confirm 
being specified it is ignored. XpathDenyConf irms must be a 
20 valid xml path. An invalid xml path will cause a 
redirection to the default error page during run time. 

AcceptRejectThreshold 

Optional. Used only in voice-only mode. If confidence for 
25 an accept confirm is not above this threshold no action 
will be taken. Legal values are 0-1 and are platform 
specific. An exception will be thrown for out of range 
AcceptRejectThreshold values. Default is zero 

30 DenyRe j ectThreshold 

Optional. Used only in voice-only mode. If confidence for a 
deny confirm is not above this threshold no action will be 
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taken. Legal values are 0-1 and are platform specific. An 
exception will be thrown for out of range 
DenyRejectThreshold values. Default is zero. 

5 FirstlnitialTimeout 

Optional. Only used in voice-only mode. Specifies the 
initial timeout in msec for the QA when count == 1. The 
status of the TargetElements specified in the Confirms 
answer list will be set to "Confirmed" if no speech is 

10 detected within f irstlnitialTimeout milliseconds. If not 
specified the default value of f irstlnitialTimeout is 0, 
which means that silence does not imply confirmation of the 
Answer. An exception will be thrown if f irstlnitialTimeout 
is specified for a QA that does not contain Confirms. An 

15 exception will be thrown for negative values of 
FirstlnitialTimeout . 

StyleRef erence 

Optional. Used in both multimodal and voice-only modes. 
20 Specifies the name of a Style object. At render time, the 
QA control will search for the named Style control and will 
use any property values specified on the Style as default 
values for its own properties. Explicitly set property 
values on the control will override those set on the Style. 

25 

ConfirmBy Omission 

Optional. Used only in voice-only mode. Default is true. 
This flag controls confirmation of more than one item. If 
the flag is set to true, then any semantic items whose 
30 xpath is not present in the reco result, will be set to 
Confirmed. Conf irmByOmission enables the following 
scenario : 
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( Con f i rmByOmi s s i on= t rue ) 
Q: Flying from? 
A: Boston. 
5 Q: Flying to? 
A: Seattle. 

Q: From Boston to Seattle? 
A: From NY. 

(Seattle is confirmed as destination city) . 

10 

Confirmlf Equal 

Optional. Used only in voice-only mode. Default is true. 
This flag controls the processing of corrections during 
confirmation. If Confirmlf Equal is true and a recognized 
15 correction is the same value already in the semantic item, 
the item is maked confirmed. If Confirmlf Equal is false and 
a recognized correction is the same value already in the 
semantic item, the item is maked as needing confirmation. 



20 Answers 

Optional. An array of answer objects. This list of objects 
is used both to determine activation, and to carry out 
semantic processing logic. An exception will be thrown if an 
Answers collection contains non-answer objects. 

25 

ExtraAnswers 

Optional. An array of answer objects. These items are not 
used for activation, but they are taken into account when 
processing recognition results. If an ExtraAnswer is 
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recognized, it will overwrite the semantic item it points 
to, even if it was previously confirmed. 

Confirms 

5 Optional. An array of answer objects. These items are used 
for activation if the answers array is empty. and they 
affect the confirmation logic. 
Prompt 

Optional for multimodal. Required for voice-only. An 
10 exception is thrown if a Prompt is not specified in voice- 
only mode . 

Reco 

Optional for multimodal and voice-only. Typically, only one 
15 reco can be specified in a QA. 

Dtmf 

Optional. Only used in voice-only mode. Typically, only one 
Dtmf can be specified in a QA. 

20 

9 Command control 

The Command control provides a way for obtaining user input 
that is not an answer to the question at hand (eg, Help, 
Repeat, Cancel) , and which does not map to textual input 
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into primary controls. A Command specifies an activation 
scope, which means that its grammar is active (in parallel 
with the current recognition grammar) for every QA within 
that scope. Commands have a type attribute which is used to 
5 implement a chain of events: Commands of the same type at 
QAs lower in scope can override superior commands with 
context-sensitive behavior (and even different / extended 
grammars if necessary) and to notify the QA what command 
was uttered (via the reason parameter) 
10 Commands are not rendered for multimodal mode, 
class Command : SpeechControl 

{ 

string id{get; set;}; 

string Scope{get; set;}; 

15 string Type {get; set;}; 

string XpathTrigger {get ; set;}; 
float AcceptCommandThreshold{get ; 

set; } ; 

string OnCl ient Command { get ; set; } ; 

20 bool AutoPostBack{get ; set ; } ; 

TriggeredEventHandler OnTriggered ; 

string StyleRef erence{get ; set; } ; 

Prompt Prompt { get ; } ; 

25 Grammar Grammar {get ; } ; 

Grammar Dtmf Grammar {get ; } ; 



30 9.1 Command Properties 

All properties of the Command control are available to the 

application developer at design time. 
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Scope 

Required. Only used in voice-only mode. Specifies the id of 
5 a QA or other ASP.NET control (e.g., form, panel, or 
table) . Scope is used in Commands to specify when the 
Command's grammars will be active. Exceptions are thrown if 
Scope is invalid or not specified. 

10 Type 

Required. Only used in voice-only mode. Specifies the type 
of command (eg 'help', 'cancel' etc.) in order to allow the 
overriding of identically typed commands at lower levels of 
the scope tree. Any string value is possible in this 
15 attribute, so it is up to the author to ensure that types 
are used correctly. An exception is thrown if Type is not 
specified. 

Note: An exception will be thrown if more than 1 Command of 
20 same Type has the same Scope. For example, 2 Type="Help" 
Commands for the same QA (Scope="QAl" ) . 

Accept CommandThr e s ho 1 d 

Optional. Only used in voice-only mode. Specifies the 
25 minimum confidence level of recognition that is necessary 
to trigger the command (this is likely to be used when 
higher than usual confidence is required, e.g. before 
executing the result of a 'Cancel' command). Legal values 
are 0-1. Default value is 0. Exceptions will be thrown for 
30 out of range AcceptCommandThreshold values. 
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If a command is matched (its xpathTrigger is present in the 
recoResult) no further commands will be processed, and no 
Answers, ExtraAnswers, Confirms, etc. will be processed. 
Then, if the confidence of the node specified by 
5 XpathTrigger is greater than or equal to the 
acceptThreshold, the active QAs LastCommandOrException is 
set to the Command's type, and the Command's onCommand 
function is called. Otherwise (if the confidence of the 
node is less than the acceptThreshold) the active QAs 
10 LastCommandOrException is set to "NoReco" and the active 
QAs Reco's OnClientNoReco function is called. 

XpathTrigger 

Required. Only used in voice-only mode. SML document path 
15 that triggers this command. An exception will be thrown if 
XpathTrigger is not specified. XpathTrigger must be a valid 
xml path. An invalid xml path will cause a redirection to 
the default error page during run time. 

20 OnClientCommand 

Optional. Only used voice-only mode. Specifies the client- 
side script function to execute on recognition of the 
Command's grammar. The function does not return any values. 
The signature for OnClientCommand is as follows: 

25 

function OnClientCommand (XMLNode smlNode) 

where: smlNode is the matched SML node. 

30 Note: If AutoPostBack is set to true, the OnClientCommand 
function is executed before posting back to the server. If 
the author wishes to persist any page state across 
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postback, the OnClient Command function is a good place to 
invoke the Client ViewState object of RunSpeech. 

AutoPostBack 

5 Optional. Only used in voice-only mode. Specifies whether 
or not the Command control posts back to the server each 
time a Command grammar is recognized. Default is false. If 
set to true, the server- side Triggered event is fired. 

10 The internal state of the voice-only page is maintained 
automatically during postback. Authors may use the 
ClientViewState object of RunSpeech to declare and set 
additional values they wish to persist across postbacks. 

15 OnTriggered 

Optional. Only used in voice-only mode. Specifies a server- 
side script function to be executed when the Triggered 
event is fired (see autopostback attribute above) . This 
handler must have the form (in C# - the signature would 
20 look slightly different in other languages) : 

void my Function (object sender, CommandTriggeredEventArgs 
e) ; 

25 The handler can be assigned to in two different ways - 
declaratively : 

< speech : Command ... OnTriggered="myFunction" .../> 

30 or programmatically : 

Command. Triggered += new TriggeredEventHandler ( myFunction 
); 



171 

TriggeredEventHandler is what is called a "delegate" - it 
basically specifies the signature of functions which can 
handle its associated event type. It looks like this: 

5 public delegate void TriggeredEventHandler ( object 

sender, TriggeredEventArgs e ) ; 

where : 

TriggeredEventArgs is a class derived from System. EventArgs 
10 which contains one public property, string Value. 

An exception will be thrown if AutoPostBack is set to true 
and no handler is specified for the Triggered event. 
15 An exception will be thrown if AutoPostBack is set to false 
and a handler is specified for the Triggered event. 

StyleRef erence 

Optional. Only used in voice-only mode. Specifies the name 
20 of a Style object. At render time, the QA control will 
search for the named Style control and will use any 
property values specified on the Style as default values 
for its own properties. Explicitly set property values on 
the control will override those set on the Style. 
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Prompt 

Optional. May be used to specify prompt to be played for 
global commands. 
5 Grammar 

Optional. The grammar object which will listen for the 
command. 

Note: The grammar object is optional because the QA scoped 
by this command may contain the rule that generates this 
10 command's Xpath. The author has the flexibility of 
specifying the rule in the QA control or the Command 
control . 



DtmfGrammar 

15 Optional. The DtmfGrammar object which will activate the 
command. Available at run time. 

Note: The DtmfGrammar object is optional because the QA 
scoped by this command may contain the rule that generates 
this command's Xpath. The author has the flexibility of 
20 specifying the rule in the QA control or the Command 
control. DtmfGrammars for all Commands along the QA's scope 
chain will be combined into the Grammars collection for the 
QA's Dtmf object. 
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Speech Controls does not provide a set of common . commands - 
e.g., help, cancel, repeat. 

10 CompareValidator control 

This control compares two values, applying the operator, 
5 and if the comparison is false, invalidates the item 
specified by Semantic I temToValidate . Optionally, both items 
(ToCompare and ToValidate) are invalidated. The 
CompareValidator is triggered on the client by change or 
confirm events; however, validation prompts are played in 
10 Speechlndex order. 

The CompareValidator control is rendered for voice-only 
mode. For multimodal, ASP.NET validator controls may be 
used . 



class CompareValidator : IndexedStyleRef erenceSpeechControl 



15 



{ 



20 



Val idat ionType 



string 
int 



string 
string 



id{get; set;}; 
Speechlndex{get ; set ; } ; 
Type {get; set;}; 
ValidationEvent {get ; set ; } ; 
Semantic I temToCompare{ get ; 



set ; } ; 



25 



string 
string 



ValueToCompare{get ; set ; } ; 
S emant i c 1 1 emToVal i da t e { ge t ; 



set; } ; 



ValidationCompareOperator 
bool 

string ; 



Operator {get ; set ; } ; 
InvalidateBoth{get ; set ; } ; 
StyleRef erence{get ; set ; } ; 



30 



Prompt 



Prompt {get ; } ; 
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10.1 CompareValidator Properties 

All properties of the CompareValidator control are only 
5 used in voice-only mode and are available to the 
application developer at design time. 

Speechlndex 

Optional. Specifies the activation order of 
10 CompareValidator controls on a page. If more than one 
control has the same Speechlndex, they are activated in 
source order. In situations where some controls specify 
Speechlndex and some controls do not, those with 
Speechlndex specified will be activated first, then the 
15 rest in source order. Speechlndex values start at 1. An 
exception will be thrown for non-valid values of 
Speechlndex. 

Type 

20 Required. Sets the datatype of the comparison. Legal values 
are "String" , "Integer", "Double", "Date", and "Currency". 
Default value is "String". 

ValidationEvent 

25 Default is "onconf irmed" . ValidationEvent may be set to one 
of two values, either "onchange" or "onconf irmed" . 

If ValidationEvent is set to "onchanged" , the 

CompareValidator will be run each time the value of the 

30 Text property of the associated Semanticltem changes. The 

CompareValidator control will be run before the 

Semanticltem' s OnChanged handler is called. The 
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Semanticltem' s OnChanged handler will only be called if the 
CompareValidator does indeed validate the changed data. If 
the CompareValidator invalidates the data, the State of the 
Semanticltem is set to Empty and the OnChanged handler is 
5 not called. 

If ValidationEvent is set to "onconf irmed" , the 
CompareValidator will be run each time the State of the 
associated Semanticltem changes to Confirmed. The 

10 CompareValidator control will be run before the 
Semanticltem' s OnConfimed handler is called. The 
Semanticltem' s OnConf irmed handler will only be called if 
the CompareValidator does indeed validate the changed data. 
If the CompareValidator invalidates the data, the State of 

15 the Semanticltem is set to Empty and the OnConfirmed 
handler is not called. 

After processing all Semanticltems involved a recognition 
turn, RunSpeech starts again. At that point, the previously 

20 failed validators will be active and RunSpeech will select 
the first QA/Validator that is active in Speechlndex order. 
It is the author's responsibility to place the validator 
controls directly before the QA control that collects the 
answer for the Semanticltem in order to get the correct 

25 behavior. 

S eman t i c I t emToCompar e 

Optional. Either Semantic I temToCompare or ValueToCompare 
must be specified. Specifies the Id of the Semanticltem 
30 which will be used as the basis for the comparison. 
Available at design time and run time. An exception will be 
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thrown if either Semantic I temToCompare or ValueToCompare is 
not specified. 

ValueToCompare 

5 Optional . Either Semantic I temToCompare or ValueToCompare 
must be specified. Specifies the value to be used as the 
basis for the comparison. The author may wish to specify 
the value here instead of taking the value from the 
semantic item. If both ValueToCompare and 

10 Semantic I temToCompare are set, SemanticItemToCompare takes 
precedence. An exception will be thrown if either 
SemanticItemToCompare or ValueToCompare is not specified. 
An exception will be thrown if ValueToCompare can not be 
converted to a valid Type. 

15 

SemanticItemToValidate 

Required. Specifies the Id of the Semanticltem that is 
being validated against either ValueToCompare or 
SemanticItemToCompare. An exception will be thrown for 
20 unspecified SemanticItemToValidate. 

Operator 

Optional. One of "Equal", "NotEqual", "GreaterThan" , 
Great erThanEqual" , "LesserThan" , 

25 "LesserThanEqual" , "DataTypeCheck" . Default value is 
"Equal". The values are compared in the following order: 
Value to Validate [operator] ValueToCompare. 

InvalidateBoth 

30 Optional. If true, both SemanticItemToCompare and 
SemanticItemToValidate are marked Empty. Default is false 
(i.e., invalidate only the SemanticItemtToInvalidate) . If 
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SemanticItemToValidate has not been set (i.e. 
ValueToCompare has been specified) , InvalidateBoth is 
ignored . 

5 The following example illustrates the usage of the 
InvalidateBoth attribute. The scenario is an itinerary 
application. The user has already been prompted and 
answered the question for departing city. At this point in 
the dialog an ASP.NET textbox control has been filled with 
10 the recognition results (assume 

txtDepartureCity . Value=" Austin" ) . 

The next QA prompts the user for the arrival city, the 
Semanticltem object binds to txtArrivalCity. Value . 
15 In response to the prompt, the user says "Boston". However, 
the recognition engine returns "Austin" (e.g. arrival city 
is same as departing city) . 

The CompareValidator control may be used to direct the 
20 dialog flow in this case to re-prompt the user for both 
departing and arriving cities: 

<CompareValidator id="compareCities" Speechlndex="5" 
Type=" String" 

25 SemanticItemToCompare=" si_DepartureCity" 

SemanticItemToValidate=" si_ArrvivalCity" 

Operat or ="Not Equal" 

InvalidateBoth="True" 

runat=" server" 
30 < /CompareVal idator > 

StyleRef erence 

Optional. Specifies the name of a Style object. At render 
time, the QA control will search for the named Style 
35 control and will use any property values specified on the 
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Style as default values for its own properties. Explicitly 
set property values on the CompareValidator control will 
override those set on the Style. 



5 Prompt 

Optional. Prompt to indicate the error. 
11 CustomValidator control 

The CustomValidator control is used to validate recognition 
results when complex validation algorithms are required. 
10 The control allows dialog authors to specify their own 
validation routines. The CustomValidator is triggered on 
the client by change or confirm events; however, validation 
prompts are played in Speechlndex order. 



15 The CustomValidator control is only rendered for voice-only 
mode. For multimodal, ASP.NET validator controls may be 
used. 



25 



class CustomValidator : IndexedStyleRef erenceSpeechControl 



20 { 



string idjget; set;}; 

int Speechlndex{get ; set ; } ; 

string ValidationEvent {get ; set; } ; 

string SemanticItemToValidate {get ; set; }; 

string ClientValidationFunction{get ; set ; } ; 

string StyleRef erence{get ; set; }; 
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Prompt Prompt { get ; } ; 



11.1 CustomValidator Properties 

All properties of the CustomValidator control are only used 
in voice-only mode and are available to the application 
35 developer at design time. 
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Speechlndex 

Optional. Only used in voice-only mode. Specifies the 
activation order of speech controls on a page and the 
activation order of composite controls. If more than one 
5 control has the same Speechlndex, they are activated in 
source order. In situations where some controls specify 
Speechlndex and some controls do not, those with 
Speechlndex specified will be activated first, then the 
rest in source order. Speechlndex values start at 1. An 
10 exception will be thrown for non-valid values of 
Speechlndex. 

ValidationEvent 

Default is "onconf irmed" . ValidationEvent may be set to one 
15 of two values, either "onchange" or "onconf irmed" . 

If ValidationEvent is set to "onchanged" , the 
CustomValidator will be run each time the value of the Text 
property of the associated Semanticltem changes. The 

20 CustomValidator control will be run before the 
Semanticltem' s OnChanged handler is called. The 
Semanticltem' s OnChanged handler will only be called if the 
CustomValidator does indeed validate the changed data. If 
the CustomValidator invalidates the data, the State of the 

25 Semanticltem is set to Empty and the OnChanged handler is 
not called. 

If ValidationEvent is set to "onconf irmed" , the 
CustomValidator will be run each time the State of the 
30 associated Semanticltem changes to Confirmed. The 
CustomValidator control will be run before the 
Semanticltem' s OnConfimed handler is called. The 
Semanticltem' s OnConf irmed handler will only be called if 
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the CustomValidator does indeed validate the changed data. 
If the CustomValidator invalidates the data, the State of 
the Semanticltem is set to Empty and the OnConfirmed 
handler is not called. 

5 

After processing all Semanticltems involved a recognition 
turn, RunSpeech starts again. At that point, the previously 
failed validators will be active and RunSpeech will select 
the first QA/Validator that is active in Speechlndex order. 
10 It is the author's responsibility to place the validator 
controls directly before the QA control that collects the 
answer for the Semanticltem in order to get the correct 
behavior. 

15 SemanticItemToValidate 

Required. Specifies the id of the Semanticltem that is 

being validated. An exception will be thrown for 
unspecified Semanticltem ToValidate. 

20 ClientValidationFunction 

Required. Specifies a function that checks the value of the 

SemanticItemToValidate . AttributeToValidate and returns true 
or false indicating whether the value is valid or invalid. 
The signature for ClientValidationFunction is as follows: 

25 

bool ClientValidationFunction (string value) 
where : 

value is the contents of 
30 ElementToValidate .AttributeToValidate . 

An exception will be thrown if ClientValidationFunction is 
not specified 

35 StyleRef erence 

Optional. Specifies the name of a Style object. At render 
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time, the QA control will search for the named Style 
control and will use any property values specified on the 
Style as default values for its own properties. Explicitly 
set property values on the control will override those set 
5 on the Style. 

Prompt 

Optional. Prompt to indicate the error. 
12 Answer object 

10 The Answer object contains information on how to process 
recognition results and bind the results to controls on an 
ASP.NET page. 

How Answer object is used. 
15 Voice-only mode. 

The RunSpeech script uses the Answer object to perform 
answer processing on the client. Answer processing begins 
when the OnReco event fired by the speech platform is 

20 received by the client. The resultant SML document returned 
by the speech platform is searched for the node specified 
by the required XpathTrigger attribute. If the XpathTrigger 
node is found in the SML document and contains a non-null 
value, the value is is filled into the semantic item 

25 specified in the Semanticltem property of the answer. 

For non-existant XpathTrigger in the SML document or null 
value of XpathTrigger, RunSpeech looks for the next QA to 
activate. 

30 After the non-null value of the XpathTrigger node is found, 
RunSpeech invokes the ClientNormalization function (if 
specified) . The ClientNormalizationFunction returns a text 
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string that reflects the author-defined transformation of 
the value of the XpathTrigger node. For example, the author 
may wish to transform the date "November 17, 2001" returned 
by the speech platform to "11/17/2001" . Semantic items are 
5 used for both simple and complex data binding. 

The SML document returned by the speech platform maycontain 
a platform-specific confidence rating for each XpathTrigger 
node. During answer processing, RunSpeech compares this 

10 confidence rating to the value specified in the 
Conf irmThreshold attribute of the Answer object. Results of 
the comparison are then used to set the internal confirmed 
state of the semantic item. This state information is 
subsequently used to determine whether or not an answer 

15 requires confirmation from the user. 

RunSpeech internally marks an answer as needing 
confirmation if the confidence returned with the 
XpathTrigger is less than or equal to the value of the 
20 Conf irmThreshold attribute. Otherwise RunSpeech internally 
marks the semantic item associated with the answer as 
confirmed. This internal state information is used during 
confirmation processing. 

25 Multimodal . 

The Answer object is used in multimodal scenarios by the 
Multimodal . j s script just as it is used by RunSpeech in 
voice-only (described above) with one exception. In 
30 multimodal, platform-specific confidence ratings are not 
compared to the Conf irmThreshold attribute of the Answer 
object, therefore internal state information of each answer 
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is not maintained. Confirmation of results is done 
visually. If an incorrect result is bound to a visual 
control, the user senses the problem visually and may then 
initiate another speech input action to correct the error. 

5 

Rendered for both multimodal and voice-only modes 
class Answer : Control 

{ 

10 string id{get; set;}; 

float Reject {get; set;}; 

float Conf irmThreshold{get ; set; }; 

string XpathTrigger {get ; set;}; 

string Semanticltem{get ; set;}; 
15 string ClientNormalizationFunction{get ; set;}; 

string StyleRef erence{get ; set ; } ; 

} 

12.1 Answer Properties 

All properties of the Answer object are available to the 
20 application developer at design time. 

Reject 

Optional. Used in both multimodal and voice-only modes. 
Specifies the rejection threshold for the Answer. Answers 
25 having confidence values below Reject will cause a noReco 
event to be thrown. If not specified, the value 0 will be 
used. Legal values are 0-1 and are platform specific. An 
exception will be thrown for out of range Reject values. 



30 Rejected Answers are treated as if they were not present in 
the reco result to begin with. If, after this processing, 
no relevant information remains (no Answers, ExtraAnswers , 
Confirms, Commands, or xpathAcceptConf irms / 

xpathDenyConf irms) , an onnoreco event is fired (which 

35 mimics exactly the tags version) . 
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Conf irmThreshold 

Optional. Used in voice-only mode. Specifies the minimum 
confidence level of recognition that is necessary to mark 
this item as confirmed. If the confidence of the matched 
5 item is less than or equal to this threshold, the item is 
marked as needing confirmation. Legal values are 0-1. 
Default value is 0. An exception will be thrown for out of 
range Conf irmThreshold values. 

10 XpathTrigger 

Required for Answers and ExtraAnswers . Optional for 
Confirms. Used in both multimodal and voice-only modes. 
Specifies what part of the SML document this answer refers 
to. It is specified as an XPath on the SML output from 
15 recognition. An exception will be thrown if XpathTrigger is 
not specified for Answers or ExtraAnswers. XpathTrigger 
must be a valid xml path. An invalid xml path will cause a 
redirection to the default error page during run time. 

20 For Confirms, if XpathTrigger is not set or set to the 
empty string, the confirm won't allow for correction. 
Yes/no confirmations are enabled when XpathTrigger is used 
in this way. 

25 Semanticltem 

Optional. Used in both multimodal and voice-only modes. 

ClientNormalizationFunction 

Optional. Used in both multimodal and voice-only modes. 
30 Specifies a client-side function that will take the matched 
sml node as a parameter and returns a string that reflects 
author-specified normalization (transformation) of the 
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recognized item. The signature for 

ClientNormalizationFunction is as follows: 

string ClientNormalizationFunction (XMLNode SMLnode, 
5 object Semanticltem) 

where : 

SMLnode is the node specified in the Xpath. 

10 Semanticltem is the client-side Semanticltem object 
specified in the Answer object. 

StyleRef erence 

Optional. Used in both multimodal and voice-only modes. 

15 Specifies the name of a Style object. At render time, the 
Answer object will search for the named Style control and 
will use any property values specified on the Style as 
default values for its own properties. Explicitly set 
property values by the Answer object will override those 

20 set on the referenced Style. 



13 SemanticMap Control 

SemanticMap is a container of Semanticltem objects. 
25 class SemanticMap : SpeechControl 

{ 

SemanticItemCollection Semltems{get ; }; 
Semanticltem GetSemanticItem (string name) ; 

} 

30 

13.1 SemanticMap Properties 
Semi terns 

A collection of Semanticltem objects. 

35 13.2 SemanticMap Methods 
GetSemanticItem 

This is a function that takes the id of a Semanticltem and 
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returns the Semanticltem object. The signature of 
GetSemanticItem is: 

function GetSemanticItem (string id) 

5 

14 Semanticltem object 

The Semanticltem object describes where and when an 
Answer's recognition results are written to visual controls 
on a page. The object also keeps track of the current state 
10 of Answers, i.e., whether an Answer has changed or been 
confirmed. 

class Semanticltem : Control 

{ 

15 string id{get; set;}; 

string Target Element {get ; set ; } ; 

string TargetAttribute {get ; set; } ; 

bool BindOnChanged{get ; set ; } ; 

string BindAt{get; set;}; 

20 bool AutoPostBack{get ; set;}; 

string OnClientChanged{get ; set ; } ; 

string OnClientConf irmed{get ; set ; } ; 

SemanticEvent Handler Changed; 

SemanticEventHandler Confirmed; 

25 string Text {get ; } ; 

SemanticState State{get;}; 

StringDictionary Attributes {get ; set;}; 

string StyleRef erence{get ; } ; 



30 



35 



14.1 Semanticltem Properties 
id 

Required. The programmatic id of this semantic item. 
TargetElement 

Optional. Used in both multimodal and voice-only modes. 



Specifies the id of the visual control to which the 
recognition results should be written. If specified, 
40 default binding will occur when the value is changed or 
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confirmed depending on the value of BindOnChanged. An 
exception is thrown if TargetElement is the id of multiple 
controls . 

5 TargetAttribute 

Optional. Used in both mutimodal and voice-only modes. 

Specifies the property name of the TargetElement to which 
this answer should be written. The default value is null. 
An exception will be thrown if TargetElement is specified 
10 and TargetAttribute is not specified. 

BindOnChanged 

Optional. Used voice-only mode, ignored in multimodal. 
Default is false. In VoiceOnly mode, BindOnChanged 

15 controls when to bind recognition results to visual 
elements . 

A value of true causes binding everytime the value of the 
Semanticltem changes. 

A value of false causes binding only when the Semanticltem 
20 has been confirmed. 

BindAt 

Optional. Used in both mutimodal and voice-only modes. Can 
be omitted or set to "server" . Default is null (omitted) . 
25 If BindAt is set to "server" , it indicates that the 
TargetElement/TargetAttribute pair refers to a server-side 
control or property. An exception will be thrown when 
BindAt is set to an invalid value. 

30 If BindAt is "server", an exception will be thrown if: 

Semanticltem. TargetElement is not a server-side 
control, or 

Semanticltem. TargetAttribute is not a member of the 
control specified by Semanticltem. TargetElement , or 
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SemanticItem.TargetAttribute is a member of 
Semanticltem. TargetElement , but is not of type string, 
or 

SemanticItem.TargetAttribute is a string, but is read- 
5 only. 

AutoPostBack 

Optional. Used in both multimodal and voice-only modes. 
Specifies whether or not the control posts back to the 
10 server when the binding event is fired. The binding event 
can be onChanged or onConfirmed and is controlled by the 
value of BindOnChange . Default is false. 

The state of the voice-only page is maintained 
15 automatically during postback. Authors may use the 
ClientViewState object of RunSpeech to declare and set any 
additional values they wish to persist across postbacks . 

OnClientChanged 

20 Optional. Used in both multimodal and voice-only modes. 
Specifies a client-side function to be called when the 
value of the Text property of this Semanticltem changes. 
The function does not return any values. The signature for 
OnClientChanged is as follows: 

25 

function OnClientChanged (object Semanticltem) 

where Semanticltem is the client-side Semanticltem object. 

30 Note: If AutoPostBack is set to true, the OnClientChanged 
function is executed before posting back to the server. If 
the author wishes to persist any page state across 
postback, the OnClientChanged function is a good place to 
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access the ClientViewState object of RunSpeech. 
OnClientConf irmed 

Optional. Used in both multimodal and voice-only modes. 
5 Specifies a client-side function to be called when this 
Semanticltem' s [value is confirmed. The function does not 
return any values. The signature for OnClientConf irmed is 
as follows: 

10 function OnClientConf irmed (object Semanticltem) 

where Semanticltem is the client-side Semanticltem object. 

Note: If AutoPostBack is set to true, the OnClientConf irmed 
15 function is executed before posting back to the server. If 
the author wishes to persist any page state across 
postback, the OnClientConf irmed function is a good place to 
access the ClientViewState object of RunSpeech 

20 Changed 

Optional. Used in both multimodal and voice-only modes. 
Specifies a server-side script function to be executed when 
the Changed event is fired. 

25 The signature of a SemanticEventHandler is: (in C# - the 
signature would look slightly different in other languages) 

public delegate void SemanticEventHandler ( object 
sender, SemanticEventArgs e ) ; 

30 

where : 

SemanticEventArgs is a class derived from System. EventArgs . 

public class SemanticEventArgs : EventArgs 
35 { 

public string Text {get;}; 

public StringDictionary Attributes {get;} 

} 

Text 
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Returns the value that this Semanticltem has been set 
to . 

State 

5 Returns the state of this Semanticltem. 

Confirmed 

Optional. Used in both multimodal and voice-only modes. 
Specifies a server-side script function to be executed when 
10 the Confirmed event is fired. In multimodal mode, the 
Confirmed event will be fired immediately after the Changed 
event . 

The signature of a SemanticEventHandler is: (in C# - the 
15 signature would look slightly different in other languages) 

public delegate void SemanticEventHandler ( object 
sender, SemanticEventArgs e ) ; 

20 where : 

SemanticEventArgs is a class derived from System. EventArgs . 
public class SemanticEventArgs : EventArgs 

{ 

25 public string Text {get;} 

public StringDictionary Attributes {get;} 

} 

Text 

30 Read only. Returns the value that this Semanticltem 

has been set to. 

State 

Read only. Returns the state of this Semanticltem. 

35 

Text 

The text value that this Semanticltem has been set to. 
Default is null. 



40 State 
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The confirmation state of this Semanticltem. Values of 
State will be one of SemanticState . Empty , 

SemanticState .NeedsConf irmation or SemanticState . Confirmed . 



5 Attributes 

Optional. Used in both multimodal and voice-only modes. 
This is a collection of name/value pairs. Attributes is 
used to pass user defined information to the client-side 
semantic item and back to the server (they are kept 
10 synchronized) . Attributes may only be set programmatically . 
For example : 



Semanticltem. Attributes [ "myvarname" ] = // myvarvalue" 

15 Attributes are not cleared when the Semanticltem is reset 
by the system. If developers wish to reset the attributes, 
they must do so manually. 



StyleRef erence 

20 Optional. Used in both multimodal and voice-only modes. 
Specifies the name of a Style object. At render time, the 
QA Semanticltem object will search for the named Style 
control and will use any property values specified on the 
Style as default values for its own properties. Explicitly 

25 set property values by the Semanticltem object will 
override those set on the referenced Style. 

14.2 Semanticltem Client-side object 

//Notation doesn't imply programming language 
class Semanticltem 
30 { 

Semanticltem (sco, id, targetElement , targetAttribute, 
bindOnChanged, bindAt Server , autoPostback, 
onCl ient Changed , onCl ient Conf i rmed , 
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hiddenFieldID, 

value, state) ; 

SetText (string text, boolean isConf irmed) ; 
5 Confirm () ; 

Clear () ; 
Empty ( ) ; 

AddValidator (validator) ; 

10 IsEmptyO; 

NeedsConf irmation ( ) ; 
IsConf irmed ( ) ; 

Encode ( ) ; 

15 

Object value; //Read only 
string state; //Read only 
obj ect attributes ; 

} 

20 

SetText (string text, boolean isConf irmed) 

The SetText method of the client side semantic item object 
is used to alter the value property. The partmeters are 

25 

string text the string which will become the value of 
the the Semantic Item 

Boolean isConfirmed determines whether the Semantic 
Item state property is "confirmed" (if true) or "needs 
30 confirmation" if false 



Confirm () 

This method sets the state property of the Semantic Item 
property to "confirmed." 

35 

Clear () 

This method sets the value property of the Semantic Item to 
NULL and sets the state property to "empty." 
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Empty ( ) 

AddValidator (validator) 
IsEmpty () 

This method checks to see if the state property of the 
5 Semantic Item and returns true if it is "empty" and false 
if it is "needs confirmation" or "confirmed." 

NeedsConf irmation ( ) 

This method checks to see if the state property of the 
10 Semantic Item and returns true if it is "needs 
confirmation" and false if it is "empty" or "confirmed." 

IsConf inned ( ) 

This method checks to see the state property of the 
15 Semantic Item and returns true if it is "confirmed" and 
false if it is "needs confirmation" or "empty." 

Encode ( ) 

20 Object value 

Readonly. 

string state 

Read Only. 

25 

object attributes 
14.3 Run-time Behavior 

As a general rule, the order of execution for every 
30 transition Empty- >NeedsConf irmation or NeedsConf irmation- 
>Conf irmed: 

- Client-side binding (if needed) 

- Client -side event 

- If (Autopostback) , trigger submit. 

35 

On the server, the order of execution is: 
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- Server- side binding (if needed) 

- Server-side event. 

If the semantic item is programmatically changed in the 
5 server, no events (server or client side) will be thrown. 

If (BindOnChanged = false) and (Autopostback=true) and we 
have both Changed and Confirmed handlers, both events will 
be triggered, in order. 

Changed events will be thrown in the server (if needed and 
10 handlers are set) even if the server- side value is the same 
as the previous one (didn't change apparently). 

If AutoPostBack is set to true, the controls will cause two 
postbacks, synchronized with onChanged, and onConfirmed. 

15 15 Prompt object 



The prompt object contains information on how to play 
prompts. All the properties defined are read/write 
properties . 

20 

Rendered for voice-only. Not rendered for multimodal. 
How Prompt object is used 
25 Voice-only 

The Prompt object is a required element of the QA control. 
RunSpeech uses the Prompt object to select the appropriate 
text for the prompt and then play the prompt on the client. 

30 After RunSpeech determines which QA to activate it either 
increments or initializes the count attribute of the QA. 
The count attribute is incremented if the QA being 
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activated was the same QA that was active during the last 
loop through RunSpeech. The count attribute is initialized 
to count = l if this is the first time the QA has been 
activated. The QA's count attribute may be used by the 
5 script specified in the PromptSelectFunction attribute of 
the Prompt object. 

RunSpeech then sets out to determine which text will be 
synthesized and played back to the user. The dialog author 

10 has the option of providing a script function for prompt 
text that is complex to build, or simply specifying the 
prompt text as content of the Prompt object. If RunSpeech 
detects the existence of an author-specified 
PromptSelectFunction, it passes the text returned from the 

15 PromptSelectFunction to the speech platform for synthesis 
and playback to the user. Otherwise RunSpeech will pass the 
text in the content of the Prompt object to the speech 
platform. 

20 If a serious or fatal error occurs during the synthesis 
process, the speech platform will fire the onError event. 
RunSpeech receives this event, sets lastCommandOrException 
to "PromptError" and calls the script function specified by 
the OnClientError attribute. The dialog author may then 

25 choose to take appropriate action based upon the type of 
error that occurred. 

After the prompt playback has finished, the speech platform 
fires the oncomplete event which is caught by RunSpeech. 
30 RunSpeech then looks for the Reco object associated with 
the current QA. If a Reco object is found (i.e., the QA is 
not just a prompting mechanism) , RunSpeech requests the 
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speech platform to start the recognition process. 

Finally, RunSpeech examines the value of the PlayOnce 
attribute of the QA containing the Prompt object. If 
5 PlayOnce is true, RunSpeech disables the Prompt object for 
subsequent activations of this same QA. 

If speech is detected during the playing of the prompt, the 
playback of the prompt will be stopped automatically by the 
10 platform. RunSpeech catches the onbargein event and halts 
execution. Since a prompt . OnComplete event may not follow 
a bargein, RunSpeech resumes when a listen event is 
received. 

15 If a bookmark is encountered, Runspeech activates the 
function specified by the OnClientBookmark property. 

Multimodal . 

The Prompt object is not used in multimodal scenarios. 

20 

Prompt Select Function 

The following three examples illustrate using the 
PromptSelectionFunction to select or modify prompt text 
25 using the parameters available to Prompt SelectFunction. 

The first example shows how to use the count parameter to 
select a prompt based upon the number of times the QA has 
been activated. The scenario is: 

30 

A user calls a menu based service, enters password. Server- 
side processing determines the user's first and last name 
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and inserts the name information into hidden textboxes 
(txtFirstName .value and txtLastName .value) on the welcome 
page. The welcome page contains a QA which prompts the user 
to enter the desired service. The QA's Prompt object is 
5 built to handle 1) the prompt to play for a first time pass 
and 2) the prompt to play if the user fails to select a 
service at the first prompting (i.e., the same QA is 
activated after a timeout expires) . 

10 <speech:QA id= !, welcomeQA M runat=" server" > 
< Prompt i d= " we 1 come Prompt " 

PromptSelectFunction="SelectWelcomePrompt " 

/> 

<Reco id="welcomeReco" mode= M automatic" > 
15 <Grammars> 

< speech : grammar id=" welcomeGrammar " 

src="http : / /mysite/services .xml " 

runat = " server " / > 

< /Grammars > 
20 </Reco> 

<Answers> 

<speech : answer id= " servicesAnswer " Semanticltem 
" siService" runat= " server" /> 

</Answers> 
25 </speech:QA> 

<script> 

function SelectWelcomePrompt (lastCommandOrException, count, 
assocArray) 
30 { 

switch (count) 

{ 

case 1: return "Welcome to Acme Services 
35 <SALT : value>txtFirstName . value</SALT : value> . Please select 
the Email, Calendar or Stock service. "; 

case 2: return "Welcome Please select the Email, 
Calendar or Stock service."; 

case 3: return "Welcome Please select the Email, 
40 Calendar or Stock service."; 

default: return "I'm sorry 
<SALT : value>txtFirstName . value</SALT:value>, we' re having 
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communication problems. Good Bye."; 
} 

} 

</ script> 

The next example shows how to use the 
lastCommandOrException parameter to modify a prompt based 
upon a event previous event in the dialog. The scenario is: 

A user is asked to provide the name of a departing airport. 
The QA contains a Prompt object that is built to handle the 
initial prompt, a prompt if the user asks for help, and a 
prompt if the user fails to respond (i.e. a timeout 

occurs) . 

< speech : qa id="qal " runat= " server" > 
< Prompt id= "prompt 1 " 

Prompt Select Function^ " SelectDepart ingAirport " / > 
<Reco id=" recol" mode=" automatic" > 
<Grammars> 

< speech : grammar id= "graml " 

src="http : //mysite/NYAirport .xml " 
runat = " server " / > 

< /Grammars > 
</Reco> 
<Answers> 

< speech : answer id= " ans 1 " Semant i c 1 1 em= " siAns 1 " 
runa t = " server " / > 

< /Answers > 
</speech:qa> 

< speech : command id= "commandl " runat= " server" 
XpathTrigger= " /sml/help" scope= "qal " type= "HELP" > 

<Grammar src="http : //mysite/help.xml " runat=" server" 

/> 

</ speech: command> 
<script> 

function SelectDepartingAirport (lastCommandOrException, 
count , assocArray) 
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{ 

if (count==l) return "From which airport would you 
like to depart?" ; 

5 switch (lastCommandOrExcept ion) 

{ 

case "SILENCE" : return "I'm sorry I didn't catch that. 
From which airport would you like to depart?"; 

case "HELP" : return "You may choose from Kennedy, 

10 La Guardia, or that little airport on Long Island. From 
which airport would you like to depart?" ; 

default return "What we have here is a failure to 
communicate . Good bye . " ; 

15 } 
} 

</script> 

The last example shows how to use the assocArray parameter 
20 to modify a prompt during a confirmation pass. The scenario 
is : 



The user is asked to provide itinerary details: departing 
and arrival cities and travel date. The QA is constructed 
25 to implicitly confirm the departing and arrival city 
information and explicitly confirm the travel date. The 
Prompt object is built to provide appropriate prompting of 
items requiring confirmation. 

30 <speech:qa id= n qal" runat= " server" > 

<Prompt id= "prompt 1" InLinePrompt= "What is your 
desired itinerary?" ></Prompt> 

<Reco id= " recol " mode= "Automatic " > 
<Grammars> 

35 < speech: grammar id="grml" 

src="http : //mysite/city_date .xml " 
runat = " server " / > 

< /Grammars > 
</Reco> 
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<Answers> 

<speech : answer id= M A1 " XpathTrigger= " /sml/departCity " 
SemanticItem="siTbl" Conf irmThreshold=" 0 . 90" 

runat = " server " / > 
5 <speech: answer id="A2" XpathTrigger=" /sml/arrivalCity" 

SemanticItem="siTb2" Conf irmThreshold=" 0 . 90" 

runat =" server" /> 

< speech : answer id= "A3 " Xpat hTr igger= " / sml /departDat e " 
Semanticltem= " siTb3 " Conf irmThreshold= "1.00 " 
10 runat =" server" /> 
< /Answers > 

</speech:qa> 

<speech : qa id= "qa2 " runat= " server" 
15 XpathDenyConf irms= " / sml / deny" 

XpathAcceptConf irms= " /sml/accept " > 

< Prompt id=" prompt 2 " 
Prompt Select Function="myPrompt Function" /> 
<Reco id="reco2" mode= " automat ic " > 
20 <Grammars> 

< speech: grammar id="grm2" 

src="http : //mysite/cityANDdateANDyes_no .xml " 
runat =" server" /> 
25 < /Grammars > 

</Reco> 
<Confirms> 

<speech: answer id="conf 1" 
XpathTrigger="/sml/departCity" SemanticItem="siTbl" 
30 Conf irmThreshold="0 . 90" runat =" server" /> 

<speech : answer id= "conf 2 " 
XpathTrigger= " /sml/arrivalCity " Semant id tem= " siTb2 " 
Conf irmThreshold="0 . 90" runat = "server " /> 

<speech : answer id= " conf 3 " 
35 XpathTrigger= " /sml /departDate " 

Semant id tern- " siTb2 " Conf irmThreshold= "1.00" 
runa t = " server " / > 

</Conf irms> 
</speech: qa> 

40 

<script> 

function myPromptFunct ion ( last CommandOrExcept ion, count , 
assocArray) 

{ 

45 var promptext = "Did you say 



if (assocArray ["siTbl"] !=null assocArray [ u siTbl" ] 
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!="") 
{ 

promptText += "from" + assocArray [ "siTbl" ] ; 
return promptText; 

5 } 

if (assocArray ["siTb2"] !=null && assocArray [ "siTb2" ] 
!="") 

{ 

10 promptText += "to" + assocArray [ "siTb2" ] ; 

return promptText; 

} 

if (assocArray ["siTbl"] !=null && assocArray [ "siTb3" ] 
15 !="") 

{ 

promptText += "on" + assocArray [ "siTb3" ] ; 

return promptText; 

} 

20 } 

</script> 

class Prompt : Control 
{ 

25 string id{get; set;}; 

string type{get; set;}; 

bool pref etch{get ; set;}; 

string lang{get; set;}; 

bool bargein{get; set;}; 

30 string src{get; set;}; 

string PromptSelectFunction{get ; set; } ; 

string OnClientBookmark{get ; set; } ; 

string OnClientError {get ; set ; } ; 

string InlinePrompt {get ; set;}; 

35 string StyleRef erence{get ; set;}; 

ParamCollection Params{get ; set : } ; 

} 



40 15.1 Prompt Properties 

All properties of the Prompt object are available at design 

time . 



type 

45 Optional. Only used in voice-only mode. The mime-type 
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corresponding to the speech output format used. No default 
value. The type attribute mirrors the type attribute on the 
SALT Prompt object. 

5 prefetch 

Optional. Only used in voice-only mode. Flag to indicate 
whether the prompt should be immediately synthesized and 
cached at browser when the page is loaded. Default value is 
false. The prefetch attribute mirrors the prefetch 
10 attribute on the SALT Prompt object. 

lang 

Optional. Only used in voice-only mode. Specifies the 
language of the prompt content. The value of this attribute 
follows the RFC xml:lang definition. Example: lang="en-us" 
denotes US English. No default value. If specified, this 
over-rides the value set in the Web.config file. The lang 
attribute mirrors the lang attribute on the SALT Prompt 
object . 

bargein 

Optional. Used only for voice-only mode. Flag that 
indicates whether or not the speech platform is responsible 
for stopping prompt playback when speech or DTMF input is 
detected. If true, the platform will stop the prompt in 
response to input and flush the prompt queue. If false, the 
platform will take no default action. If unspecified, 
default to true. 

30 PromptSelectFunction 

Optional. Only used in voice-only mode. Specifies a client- 
side function that allows authors to select and/or modify a 
prompt string prior to playback. The function returns the 
prompt string. PromptSelectFunction is called once the QA 



15 



20 



25 
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has been activated and before the prompt playback begins. 
If PromptSelectFunction is specified, src and InLinePrompt 
are ignored. 

5 The signature for PromptSelectFunction is as follows: 

String PromptSelectFunction ( string lastCommandOrException, 
int Count, object SemanticItemList ) 

10 where : 

lastCommandOrException is a Command type (e.g., 
"Help") or a Reco event (e.g., "Silence" or "NoReco" ) . 

Count is the number of times the QA has been activated 
15 consecutively. Count starts at 1 and has no limit. 

SemanticItemList For voice-only mode, SemanticItemList is 
an associative array that maps semantic item id to semantic 
item objects. For multimodal, SemanticItemList will be 
20 null. 

If the PromptSelectFunction is being called from within a 
Prompt object specified by a CustomValidator control, the 
SemanticItemList will contain the Semanticltem being 
25 validated. 

If the PromptSelectFunction is being called from within a 
Prompt object specified by a CompareValidator control, the 
SemanticItemList will contain the Semanticltem being 
30 validated and (if specified) the Semanticltem to which it 
is being compared. 

OnCl ientBookmark 

Optional. Only used in voice-only mode. Specifies a client 
35 side function which is called when a Bookmark is reached in 
the prompt text during playback. The function does not 
return a value. The signature for OnCl ientBookmark is as 
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follows : 

function OnClientBookmark ( ) 

5 

OnClientError 

Optional. Only used in voice-only mode. Specifies a client 
side function which is called in response to an error event 
in the client. Error events are generated from the event 

10 object. The function returns a Boolean value. The RunSpeech 
algorithm will continue executing if an OnClientError 
script returns true. The RunSpeech algorithm will navigate 
to the default error page specified in the Web.config file 
if an OnClientError script returns false or if an error 

15 occurs and the OnClientError function is not specified. 
When navigating to the error page, both status and 
description will be passed in the query string. For 
example, if the error page is http: //myErrorPage , we will 
navigate to http : / /myErrorPage?status=X&description=Y 

20 (where X is the status code associated with the error and Y 
is the description of that error given in the Speech Tags 
Specification. The signature for OnClientError is as 
follows : 

25 bool OnClientError (int status) 

where status is the code returned in the event object. 

Note: For the SDK Beta release, it is advisable to specify 
30 a default error page using the syntax described in Section 
5 Global Application Settings 

InlinePrompt 

Optional. Only used in voice-only mode. The text of th 

35 prompt to be played. It may contain further markup, as in 

TTS rendering information, or <value> elements. If a 
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PromptSelectFunction function is specified, the 
InlinePrompt is ignored. 

StyleRef erence 

5 Optional. Used in both multimodal and voice-only modes. 
Specifies the name of a Style object. At render time, the 
Prompt object will search for the named Style control and 
will use any property values specified on the Style as 
default values for its own properties. Explicitly set 
10 property values by the Prompt object will override those 
set on the referenced Style. 

Params 

Optional. An collection of param objects that specify 
15 additional, non-standard configuration parameter values to 

the speech platform. The exact nature of the conf igurative 

parameters will differ according to the proprietary 
• platform used. Values of parameters may be specified in an 

XML namespace, in order to allow complex or structured 
20 values. An exception will be thrown if the Params 

collection contains a non-param object. 

For example, the following syntax could be used to specify 
the location of a remote prompt engine for distributed 
architectures : 

25 

< Params > 

< speech : param name=" prompt Server" 
runat = // server // >//myplatf orm/promptServer</ speech :param> 
</ Params > 

30 

16 Reco object 

Reco is rendered for both multimodal and voice-only modes. 

The Reco object is used to specify speech input resources 
35 and features as well as provide for the management of cases 
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when vaild recognition results are not returned. 

How Reco object is used. 
5 Voice-only 

During the processing of the Prompt object, RunSpeech 
determines whether or not the currently active QA contains 
a Reco object. If it does, RunSpeech asks the speech 
platform to start the recognition process using the grammar 

10 specified by the Reco's Grammar object. RunSpeech calls 
the function specified by OnClientListening immediately 
after activating the Reco's underlying <listen> tag. The 
recognition process is stopped depending on the value of 
the mode attribute. RunSpeech processes successful 

15 recognition results using information specified in the 
Answer object. 

RunSpeech uses the Reco object to handle the situations 
when the speech platform is not able to return valid 
20 recognition results, i.e., speech platform errors, 
timeouts, silence, or inability of the speech platform to 
recognize an utterance. In each of these cases, RunSpeech 
calls the appropriate handler (if specified) after setting 
the value of the lastCommandOrException attribute. 

25 

Multimodal 

The Reco object is used by the Multimodal . j s client-side 
script just as it is used by the RunSpeech voice-only 
client-side script (as described above) with one exception, 
30 starting/stopping the recognition process. Multimodal 
scenarios do not require speech output as a mechanism to 
prompt the user for input. In fact, prompting in speech 
controls is not available in multimodal scenarios as the 
Prompt object is not rendered to the client. Therefore, an 
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alternate mechanism is required to start the recognition 
process . 



10 



Multimodal . j s uses the event specified in the 
StartElement/StartEvent attributes to start the recognition 
process. The function specified by the OnClientListening 
attribute is called after the recognition process has 
started. Multimodal . j s uses the combination of the 
StopEvent and mode attributes to stop the recognition 
process . 



class Reco : Control 



15 



20 



25 



30 



35 



{ 



string id{get; set;}; 

string StartElement {get ; set ; } ; 

string StartEvent {get ; set;}; 

string StopElement {get ; set ; } ; 

string StopEvent {get ; set;}; 

int initialTimeout {get ; set;}; 

int babbleTimeout {get ; set;}; 

int maxTimeout {get ; set ; } ; 

int endSilence{get ; set;}; 
float reject {get; set;}; 

string mode{get; set;}; 

string lang{get; set;}; 

string GrammarSelectFunction{get ; set ; } ; 

string OnClientSpeechDetected{get ; set ; } ; 

string OnClientSilence{get ; set ; } ; 

string OnClientNoReco{get ; set; }; 

string OnClientError {get ; set ; } ; 
string StyleRef erence {get ; set ; } ; 

GrammarCol lection Grammars { get ; set; } ; 
ParamCollection Params{get ; set ; } ; 

Control record{get ; set ; } ; 



15.1 Reco Properties 

All properties are available at design time. 
Start Element 

Optional, but must be present if StartElement is specified. 



i 
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Used only in multimodal mode. Specifies the name of the 
GUI element with which the start of the Reco is associated. 
See StartEvent. No default value. 

5 StartEvent 

Optional, but must be present if StartElement is specified. 
Only used in multimodal mode. Specifies the name of the 
event that will activate (start) the underlying client -side 
Reco object. See start Element No default value. 

10 

Start Element 

Optional, but must be present if StopElement is specified. 
Used only in multimodal mode. Specifies the name of the 
GUI element with which the stop of the Reco is associated. 
15 See StopEvent. No default Value 

StopEvent 

Optional, but must be present if StartElement is specified. 
Only used in multimodal mode. Specifies the name of the 
20 event that will stop the underlying client-side Reco 
object. See stop Element. No default value. 

StartEvent and StopEvent will be used in mult i -modal 
applications, typically for tap-and-talk interactions. E.g. 
25 StartEvent =Buttonl . onmousedown, 
StopEvent =Buttonl . onmouseup . 

StartEvent and StopEvent are allowed to be the same (click 
to start, click to stop). However, it is the author's 
30 responsibility to de-activate Recos before starting new 
ones in the case when the end user fires two StartEvents in 
succession (e.g., click on one control to start a reco then 
click on a different control to start another reco before 
stopping first reco) . 
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Note: IE requires exact cases when running Jscript . 
Therefore, the the case for event values specified in the 
StartEvent and StopEvent attributes must be exactly as 
5 those events are defined. For example, the onmouseup and 
onmousedown events are specified in all lower case letters. 

Note: StartEvent and StopEvent are not rendered for voice- 
only mode . 

10 

initialTimeout 

Optional. Used in both multimodal and voice-only modes. The 
max time in milliseconds between start of recognition and 
the detection of speech. This value is passed to the 
15 recognition platform, and if exceeded, an onSilence event 
will be thrown from the recognition platform. If not 
specified, the speech platform will use a default value. No 
default value. An exception will be thrown for non-integer 
or negative integer value. 

20 

Note: The sum of the initialTimeout and babbleTimeout 
values should be smaller or equal to the global maxTimeout 
attribute or the Reco attribute maxTimeout (see below) if 
it is set. 

25 

Note: The initialTimeout attribute mirrors the 

initialTimeout attribute on the SALT Reco object. 
babbleTimeout 

Optional. Used in both multimodal and voice-only modes. 
30 Optional. The maximum period of time in milliseconds for an 
utterance. For recos in automatic and single mode, this 
applies to the period between speech detection and the 
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speech endpoint or stop call. For recos in 'multiple' mode, 
this timeout applies to the period between speech detection 
and each phrase recognition- i.e. the period is restarted 
after each return of results or other event. If exceeded, 
5 the onnoreco event is thrown with status code -15. This can 
be used to control when the recognizer should stop 
processing excessive audio. For automatic mode listens, 
this will happen for exceptionally long utterances, for 
example, or when background noise is mistakenly interpreted 

10 as continuous speech. For single mode listens, this may 
happen if the user keeps the audio stream open for an 
excessive amount of time (eg by holding down the stylus in 
tap-and-talk) . If the attribute is not specified, the 
speech platform will use a default value. 

15 No default value. An exception will be thrown for non- 
integer or negative integer values. 

Note: The sum of the initialTimeout and babbleTimeout 
20 values should be smaller or equal to the global maxTimeout 
attribute or the Reco attribute maxTimeout (see below) if 
it is set. 

Note: The babbleTimeout attribute mirrors the babbleTimeout 
25 attribute on the SALT Reco object. 

maxTimeout 

Optional. Used in both multimodal and voice-only modes. The 
30 period of time in milliseconds between recognition start 
and results returned to the browser. If exceeded, an 
OnError event is thrown by the browser - this provides for 
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network or recognizer failure in distributed environments. 
For Recos in "multiple" mode, as with babbleTimeout , the 
period is restarted after the return of each recognition or 
other event. No default value. An exception will be thrown 
5 for non-integer or negative integer values. 

Note: maxTimeout should be greater than or equal to the sum 
of initialTimeout and babbleTimeout. If specified, the 
value of this attribute over-rides the value of maxTimeout 
10 set in the Web.config file. No default value. 

Note: The maxTimeout attribute mirrors the maxTimeout 
attribute on the SALT Reco object. 

15 endSilence 

Optional. Used in both multimodal and voice-only modes. For 
Reco objects in "automatic" mode, the period of silence in 
milliseconds after the end of an utterance which must be 
free of speech after which the recognition results are 
20 returned. Ignored for Recos of modes other than 
"automatic" . If not specified, defaults to platform 
internal value. An exception will be thrown for non- integer 
or negative integer values. 

25 reject 

Optional. Used in both multimodal and voice-only modes. 
Specifies the rejection threshold, below which the platform 
will throw the noReco event. If not specified, the speech 
platform will use an internal default value. Legal values 
30 are 0-1 and are platform specific. An exception will be 
thrown for out of range reject values. Default is 0. 
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lang 

Optional. Used in both multimodal and voice-only modes. 
Specifies the language of the speech recognition engine.. 
The value of this attribute follows the RFC xmlilang 
5 definition. Example: lang="en-us" denotes US English. No 
default value. This over-rides the global setting in the 
Web.config file. The lang attribute mirrors the lang 
attribute on the SALT Reco object, 
mode 

10 Optional. Used in both multimodal and voice-only modes. 
Specifies the recognition mode to be followed. Default is 
"automatic". Legal values are "automatic", "single", and 
"multiple" . 

1 5 Mode = " aut oma t i c " 

Used for recognitions in telephony scenarios. The speech 
platform itself (not the application) is in control of when 
to stop the recognition process. Mode=" automatic" is the 
only mode setting that works in voice-only, other modes 

20 will be ignored and "automatic" will be used. 

Mode=" single" 

Used for multimodal (tap-to-talk) scenarios. The return of 
a recognition result is under the control of an explicit 
25 call to stop the recognition process by the application. 
However, exceeding babbleTimeout or maxTimeout will stop 
recognition. Mode=" single" is ignored for voice-only. 

Mode=" multiple" 

30 Used for "open-microphone" or dictation scenarios. 
Recognition results are returned at intervals until the 
application makes an explicit call to stop the recognition 
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process (or babbleTimeout or maxTimeout periods are 
exceeded) . Multiple mode recos are not supported in voice- 
only mode dialogs. If the browser is a voice-only browser 
and reco mode is set to "multiple" , an exception will be 
5 thrown at render time. Mode=" multiple" is ignored for 
voice-only. 



10 GrammarSelectFunction 

Optional. Used in both multimodal and voice-only modes. 
Specifies a client-side script that will be called prior to 
starting the recognition process. The script is written by 
the dialog author and may be used to select or modify the 
15 Grammar objects associated with the Reco object. The script 
may also be used to adjust speech recognition features or 
confidence/re j ection thresholds . The GrammarSelectFunction 
function does not return values. The signature for 
GrammarSelectFunction is as follows: 

20 

function GrammarSelectFunction ( object recoObj, string 
lastCommandOrException, int Count, object SemanticItemList ) 

where : 

25 recoObj is the Reco object about to start. 

lastCommandOrException is a Command type (e.g., "Help") or 
a Reco event (e.g., "Silence" or "NoReco"). For multimodal 
dialogs, lastCommandOrException will be an empty string 
30 Count is the number of times the QA containing the Reco 
object has been activated consecutively. Count starts at 1 
and has no limit. For multimodal dialogs, count will be 
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zero . 

SemanticItemList For voice-only mode, SemanticItemList is 
an associative array that maps semantic item id to semantic 
5 item objects. For multimodal dialogs, SemanticItemList will 
be null. 

OnClientSpeechDetected 

Optional. Used in both multimodal and voice-only modes. 

10 Specifies a client-side script function that will be called 
when the onspeechdetected event is fired by the speech 
recognition platform on the detection of speech. 
Determining the actual time of firing is left to the 
platform (which may be configured on certain platforms 

15 using the <param> element. This may be anywhere between 
simple energy detection (early) or complete phrase or 
semantic value recognition (late) . This event also triggers 
onbargein on a prompt which is in play and may disable the 
initial timeout of a started dtmf object. This function can 

20 be used in multimodal scenarios, for example, to generate a 
graphical indication that recognition is occurring, or in 
voice-only scenarios to enable fine control over other 
processes underway during recognition. The function does 
not return any values. The signature for 

25 OnClientSpeechDetected is as follows: 

function OnClientSpeechDetected ( ) 

If a Dtmf object is active when the OnClientSpeechDetected 
30 function is called, the timeouts of the Dtmf object will be 
disabled. 
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OnClientSilence 

Optional. Used in both multimodal and voice-only modes. 
Specifies a client-side script that will be called after 
detecting silence (in response to SALT reco onSilence 
5 event). The function does not return any values. The 
signature for OnClientSilence is as follows: 

function OnClientSilence (int status) 

10 where status is the code returned in the event object. 

If a Dtmf object is active when the OnClientSilence 
function is called, the Dtmf object will be stopped. 

15 OnClientNoReco 

Optional. Used in both multimodal and voice-only modes. 
Specifies a client-side script that will be called after 
detecting no recognition (in response to SALT reco onNoReco 
event). The function does not return any values. The 
20 signature for OnClientNoReco is as follows: 

function OnClientNoReco (int status) 

where status is the code returned in the event object. 

25 

If a Dtmf object is active when the OnClientNoReco function 
is called, the Dtmf object will be stopped. 

30 OnClientError 

Optional. Used in both multimodal and voice-only modes. 
Specifies a client side function which is called in 
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response to an error event in the client. Error events are 
generated from the event object. The function returns a 
boolean value. The RunSpeech algorithm will continue 
executing if an OnClientError script returns true. The 
5 RunSpeech algorithm will navigate to the default error page 
specified in the Web.config file if an OnClientError script 
returns false or if an error occurs and the OnClientError 
function is not specified. When navigating to the error 
page, both status and description will be passed in the 

10 query string. For example, if the error page is 

http : //myErrorPage , we will navigate to 

http : / /myErrorPage?status=X&description=Y (where X is the 
status code associated with the error and Y is the 
description of that error given in the Speech Tags 

15 Specification. The signature for OnClientError is as 
follows : 

bool OnClientError (int status) 

20 where status is the code returned in the event object. 

Note: the return value of OnClientError is ignored in 
multimodal mode. 

25 If a Dtmf object is active when the OnClientError function 
is called, the Dtmf object will be stopped. 

StyleRef erence 

Optional. Used in both multimodal and voice-only modes. 
30 Specifies the name of a Style object. At render time, the 
Reco object will search for the named Style control and 
will use any property values specified on the Style as 
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default values for its own properties. Explicitly set 
property values by the Reco object will override those set 
on the referenced Style. 

5 Grammars 

Optional. An array of grammar objects as specified below. 
An exception will be thrown if a Grammars collection 
contains a non-grammar object. 

10 Params 

Optional. Used in both multimodal and voice-only modes. An 
collection of param objects that specify additional, non- 
standard configuration parameter values to the speech 
platform. The exact nature of the conf igurative parameters 
15 will differ according to the proprietary platform used. 
Values of parameters may be specified in an XML namespace, 
in order to allow complex or structured values. An 
exception will be thrown if the Params collection contains 
a non-param object. 

20 

For example, the following syntax could be used to specify 
the location of a remote speech recognition server for 
distributed architectures: 

25 < Params > 

<speech: param name="recoServer" 
runat=" server" >//myplatf orm/recoServer</speech :param> 
</ Params > 



30 

record 

Optional. Used in both multimodal and voice-only modes. The 
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record object is used for recording audio input from the 
user. Recording may be used in addition to recognition or 
in place of it, according to the abilities of the platform 
and its profile. Only one record object is permitted in a 
5 single <reco>. 

17 Grammar object 

The grammar object contains information on the selection 
and content of grammars, and the means for processing 
recognition results. All the properties defined are 
10 read/write properties. 

class Grammar : Control 



17.1 Grammar Properties 

Grammar is rendered for both multimodal and voice -only 
modes. All properties are available at design time and run 
time. 

25 

type 

Optional. Used in both multimodal and voice-only modes. The 
mime -type corresponding to the grammar format used. No 
default value. The type attribute mirrors the type 
30 attribute on the SALT Grammar object. 



15 



string 
string 
string 
string 
string 
string 



id{get ; set ; } ; 
type {get ; set ; } ; 
lang{get ; set ; } ; 
src{get; set;}; 
InLineGrammar {get ; set; } ; 
StyleRef erence{get ; set; } ; 



20 } 
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lang 

Optional. Used in both multimodal and voice-only modes. 
String indicating which language the grammar refers to. The 
5 value of this attribute follows the RFC xmlrlang 
definition. Example: lang="en-us" denotes US English. No 
default value. Over-rides the global value set in the 
Web.config file. The lang attribute mirrors the lang 
attribute on the SALT Grammar object. 

10 

src 

Optional. Used in both multimodal and voice-only modes. 
Specifies the URI of the grammar to load. If an inline 
grammar and src are both specified the inline grammar takes 
15 precendence and src is ignored. The src attribute mirrors 
the src attribute on the SALT Grammar object. An exception 
will be thrown if one of src or InlineGrammar is not 
specified. 

20 InlineGrammar 

Optional. Used in both multimodal and voice-only modes 
InlineGrammar accesses the text of the grammar specified 
inline. If InlineGrammar and src are both specified, 
InlineGrammar takes precendence and src is ignored. An 
25 exception will be thrown if one of src or InlineGrammar is 
not specified. 



30 Inline grammars must be HTML Encoded, they are HTML 
encoded when sent down to the server. Authors must use &gt 
for > and &lt for < and adhere to all other HTML Encoding 
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standards. It is recommended that authors use the property 
builder in DET, which will handle the HTML encoding 
automatically. 

5 StyleRef erence 

Optional. Used in both multimodal and voice-only modes. 
Specifies the name of a Style object. At render time, the 
Grammar object will search for the named Style control and 
will use any property values specified on the Style as 
10 default values for its own properties. Explicitly set 
property values by the Grammar object will override those 
set on the referenced Style. 

18 Dtmf object 

Dtmf may be used by QA controls in telephony applications. 
15 The Dtmf object essentially applies a different modality of 
grammar (a keypad input grammar rather than a speech input 
grammar) to the same question. 

class Dtmf : Control 



20 { 



25 



bool 



string 



string 
string 
string 
string 



int 



int 



int 



id {get ; set ; } ; 
pref lush{get ; set ; } ; 
initialTimeOut {get ; set; } ; 
interDigitTimeOut {get ; set ; } ; 
endSilence{get ; set ; } ; 
OnClientSilence{get ; set ; } ; 
OnClientKeyPress{get ; set ; } ; 
OnClientError {get ; set ; } ; 
StyleRef erence {get ; set; } ; 



30 



ParamCol lection 



Params{get ; set ; } ; 
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GrammarCol lection Grammars {get; set ; } ; 

} 

18.1 Dtmf Properties 

5 All properties are available at design time. 

Pref lush 

Optional. Flag to indicate whether to automatically flush 
the DTMF buffer on the underlying telephony interface card 
10 before activation. Default is "false" (to enable type-ahead 
functionality) . The preflush attribute mirrors the preflush 
attribute on the SALT DTMF object. 
InitialTimeOut 

Optional. The number of milliseconds to wait for receiving 
15 the first key press before raising a timeout event. If this 
timeout occurs the DTMF collection end automatically. If 
unspecified, initialTimeout defaults to a telephony 
platform internal setting. An exception is thrown if 
initialTimeout is a negative value. The initialTimeout 
20 attribute mirrors the initialTimeout attribute on the SALT 
DTMF object. 
InterdigitTimeOut 

Optional. The timeout period in milliseconds for adjacent 
DTMF presses before raising a timeout event. If this 

25 timeout occurs the DTMF collection ends automatically. If 
unspecified, interdigitTimeout defaults to a telephony 
platform internal setting. An exception is thrown if 
initialTimeout is a negative value. The interdigitTimeout 
attribute mirrors the interdigitTimeout attribute on the 

30 SALT DTMF object. 
EndSilence 
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Optional. The timeout period in milliseconds when input 
matches a complete path through the grammar but further 
input is still possible. This timeout specifies the period 
of time in which further input is permitted after the 
5 complete match. Once exceeded, onreco is thrown. (For a 
complete grammar match where further input is not possible, 
the endsilence period is not required, and onreco is thrown 
immediately.) If this attribute is not supported directly 
by a platform, or unspecified in the application, the value 
10 of endsilence defaults to that used for interdigittimeout . 
An exception is thrown if endsilence is a negative value. 

OnClientSilence 

Optional. Specifies a client-side script function to be 
15 called if there is no DTMF key press before initialTimeout 
expires. The platform halts DTMF collection automatically. 
The QA treats this as a silence. The function returns no 
values. The signature for OnClientSilence is as follows: 

20 function OnClientSilence ( ) 

If a Reco object is active when the OnClientSilence 
function is called, the Reco object will be stopped. 

25 OnClientKeyPress 

Optional. Specifies a client-side script function that is 
called on every pressing of a DTMF key which is legal 
according to the input grammar. If a prompt is in playback, 
the onkeypress event will trigger the onbargein event on 
30 the prompt (and cease its playback if the prompt's bargein 
attribute is set to true) . If a Reco object is active, the 
first onkeypress event will disable the timeouts of the 
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Reco object. 
OnClientError 

Optional. Specifies a client-side function which is called 
5 in response to a serious or fatal error with the DTMF 
collection/recognition process. Error events are generated 
from the event object. The function returns a boolean 
value. The RunSpeech algorithm will continue executing if 
an OnClientError script returns true. The RunSpeech 

10 algorithm will navigate to the default error page specified 
in the Web.config file if an OnClientError script returns 
false or if an error occurs and the OnClientError function 
is not specified. When navigating to the error page, both 
status and description will be passed in the query string. 

15 For example, if the error page is http : //myErrorPage , we 
will navigate to http : //myErrorPage?status=X&description=Y 
(where X is the status code associated with the error and Y 
is the description of that error given in the Speech Tags 
Specification. The signature for OnClientError is as 

20 follows: 

bool OnClientError (int status) 
where status is the code returned in the event object. 

25 

If a Reco object is active when the OnClientError function 
is called, the Reco object will be stopped. 

OnClientNoReco 

30 Optional. Specifies a client side function which is called 
in response to a failure to recognize by the DTMF 
collection/recognition process. This is most lokely to 
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occur when the input detected does not match an path 
through the active grammars. The function does not need to 
return a value. The prototype for the function is: 

5 OnClientNoReco (int status) 

Where status is the code returned the in the event object. 

StyleRef erence 

10 Optional. Used in both multimodal and voice-only modes. 
Specifies the name of a Style object. At render time, the 
Dtmf object will search for the named Style control and 
will use any property values specified on the Style as 
default values for its own properties. Explicitly set 

15 property values by the Dtmf object will override those set 
on the referenced Style. 

Grammars 

Optional. An array of grammar objects. 

20 

Params 

An collection of param objects that specify additional, 
non-standard configuration parameter values to the speech 
platform. The exact nature of the conf igurative parameters 
25 will differ according to the proprietary platform used. 
Values of parameters may be specified in an XML namespace, 
in order to allow complex or structured values. An 
exception will be thrown if the Params collection contains 
a non-param object. 

30 

For example, the following syntax shows how to specify a 
parameter on particular DTMF platform. 
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<Params> 

< speech : param name= "myDTMFParam" runat=" server" > 

myDTMFValue < / speech : param> 
5 </Params> 



19 Param object 

The param object allows authors to specify the names and 
values of additional, non-standard configuration parameters 
10 to the speech platform. The exact nature of the 
conf igurative parameters will differ according to the 
proprietary platform used. Values of parameters may be 
specified in an XML namespace, in order to allow complex or 
structured values. 

15 

class param : Control 

{ 

string name {get; set;}; 
string Value {get; set;}; 

20 } 

Note that the value of a param object is specified between 
the param tags. 

19.1 Param Properties 

25 

name 

Required. The name of the parameter to be configured. An 
exception will be thrown for <param> elements that do not 
contain the name attribute. 

30 

Value 

Optional. The value which will be assigned to the named 
parameter . 
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20 Record object 

The record object is used to record audio input from the 
user. Recording may be used in addition to recognition or 
in place of it, according to the abilities of the platform 
5 and its profile. 

class record : Control 

{ 

bool enabled{get; set;}; 

10 string type {get; set;}; 

bool beep {get; set;}; 

} 

20.1 Record Properties 
15 enabled 

Optional. Flag to indicate whether or not to record the 
user input. Defaults to w f alse" . 

type 

20 Optional. MIME type of the recording. MIME types can be 
specified such as "audio/wav" for WAV (RIFF header) 8kHz 8- 
bit mono mu-law [PCM] single channel or "audio/basic" for 
Raw (headerless) 8kHz 8-bit mono mu-law [PCM] single 
channel. If unspecified, defaults to G.711 wave file. 

25 

beep 

Optional. Boolean value, if true, the platform will play a 
beep before recording begins. Defaults to false. 

30 21 Call control 

All call-related server-side controls deal with a single 
device and a single active call at any given time. If the 
dialog author needs to monitor more than one device or 
handle more than one active call, the custom SmexMessage 
35 can be used and the author will have to handle CSTA 
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messages . 

All call control controls are only used in voice-only mode. 

5 The SpeechControls.dll will implement a support class 
(Calllnfo) , a base class (SmexMessageBase) , and the 
following WebControls: 
SmexMessage 

o for custom / advanced CSTA messages, and messages 
10 to any non-CSTA <smex> elements by specifying a 

client side <smex> element 
Transf erCall 

o for CSTA SingleStepTransf er service 
MakeCall 

15 o for CSTA MakeCall service 

DisconnectCall 

o for CSTA ClearConnection service 
AnswerCall 

o for CSTA AnswerCall service 



21.1 Common Classes 

21.1.1 Calllnfo 

class Calllnfo 
25 { 

string MonitorCrossRef Id {get;}; 
string Deviceld {get;}; 
string Callld {get;}; 
string CallingDevice {get;}; 
30 string CalledDevice {get;}; 

} 

21.1.1.1 Calllnfo Properties 

35 MonitorCrossRef Id: The id returned by the start page's 
MonitorStart . 



Deviceld: The device id for the current active call. 
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Callld: The call id for the current active call. These 
properties can be used in the custom SmexMessage object to 
form the correct CSTA xml message on the web server side. 



5 CallingDevice: This represents the calling device 
information provided by the network (ANI , for example). 
This information will always remain with the call and will 
never change (unlike the callingDevice) . 

10 CalledDevice : This represents the called device information 
provided by the network (DNIS, for example) . This 
information will always remain with the call and will never 
change (unlike the calledDevice) . 

15 21 . 1 .2 SmexMessageBase 

This is an internal class. Authors that need to create new 
call-control controls should derive from SmexMessage. 



internal class abstract SmexMessageBase 



20 { 



25 



string ID {get; set }; 

int Timer (get; set}; 

bool AutoPostback {get; set}; 

string ClientActivationFunction {get; set}); 

string OnClientError {get, set}; 

string OnClient Timeout {get; set}; 

Calllnfo CurrentCall {get; } 



30 21.1.2.1 SmexMessageBase Properties 
ID: ASP.NET control ids. 

Speechlndex: Same as for other speech controls controls. 
35 This index controls the order of the object within 
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RunSpeech. Default 0, meaning source order after all non- 
zero indexed speech objects. 

Timer: Number in milliseconds indicating the time span 
5 before a timeout event will be triggered. This set on the 
client side <smex> object before the CSTA message is sent. 
The default is 0, meaning no timeout. An exception will be 
thrown for neagtive values of Timer. 

10 AutoPostback: Whether to cause a postback when the object's 
event is fired. Default is false. 

ClientActivationFunction: The client side function called 
by RunSpeech to determine whether an object is active. When 
15 not specified, the object is considered active only once 
(the PlayOnce behavior) . ClientActivationFunction returns a 
bool to indicate whether the associated object should be 
active (true) or not (false) . The signature for 
ClientActivationFunction is: 

20 

function ClientActivationFunction ( object sender) 

where sender is the current object . 

25 OnClientError : Optional. Default is false when not 
specified. The client side function called when <smex> 
fires the onerror event. OnClientError returns a bool - 
true to continue RunSpeech and false to go to the error 
page. The signature for OnClientError is: 

30 

function OnClientError ( object sender, int status) 
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where 

sender is the current object, and 

status is the value of the object's status property. 

5 

OnClientTimeout : Optional. Default is true when not 
specified. The client side function called when <smex> 
fires the ontimeout event. OnClientTimeout returns a bool - 
true to continue RunSpeech and false to go to the error 
10 page. The signature for OnClientTimeout is: 

function OnClientTimeout ( object sender ) 

where 

15 sender is the current object. 

CurrentCall: Returns the current active call object. 

21.2 Server-side Classes 

20 21.2.1 SmexMessage 

This is a generic class for sending raw CSTA messages and 

receiving CSTA events. 



Since the number and types of events generated by this 
25 message is unknown, the author needs to be careful about 
when RunSpeech can continue. 

RunSpeech will be paused just before calling author's 
OnClientBeforeSend function when the message is about 
30 to be sent . 

If OnClientReceive is not specified, RunSpeech will 
resume when any smex event is received after message 
is sent. 

If OnClientReceive is specified, the author returns 
35 true to indicate RunSpeech can resume after receiving 

the expected event . 

RunSpeech will resume after Error or Timeout happens. 
The Smex Timer will be set to the given value before the 
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message is sent and back to zero right before RunSpeech 
resumes . 



When an unexpected smex event arrives, i.e. when the 
5 current active object in RunSpeech is not a call related 
object, the smex event is ignored. 

When AutoPostback is set to true, all events will execute 
the client handler, then cause a post-back to the web 
10 server where the corresponding server event will be fired. 

class SmexMessage : SmexMessageBase 

{ 

string Message {get; set}; 
15 string ClientSmexId {get; set}; 

string OnClientBef oreSend {get; set}; 
string OnClientReceive {get; set}; 
event Receive; 

} 

20 

21.2.1.1 SmexMessage Properties 

Message: Required. The CSTA XML message to be sent. An 
exception will be thrown if Message is not specified. 

25 

OnClientBef oreSend: Optional. Client side function called 
just before the message is sent. This is to give the author 
a last chance to modify the message. OnClientBef oreSend 
returns a string containing the new message. If null is 
30 returned, original message will be sent. The signature for 
OnClientBef oreSend is: 

function OnClientBef oreSend (object sender, string Message 
) 

35 

where : 

sender is the client -side SmexMessage object, and 
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Message is the original message. 

Receive: Optional. Server side event when client side 
5 <smex> object receives smex events. The signature of a 
ReceiveEventHandler is : 



void ReceiveEventHandler (object sender, 
ReceiveEventArgs e) 

10 

where 

sender will be the server side SmexMessage object. The 
second argument e is of following type: 

15 class ReceiveEventArgs : EventArgs 

{ 

string Received {get}; 

} 

20 where 

Received contains the event message received from 
<smex> . 

OnClientReceive: Optional. Client-side function called when 
25 client side <smex> object receives smex events. 
OnClientReceive returns a bool - true means that this 
object has got all the events and RunSpeech can continue, 
false means that this object expects more events before 
RunSpeech can continue. The signature for OnClientReceive 
30 is: 

function OnClientReceive ( object sender, string Message) 
where 

35 sender is the client-side SmexMessage object, and 
Message is the received message. 

ClientSmexId: Optional. This is the client side <smex> 
40 element id. If not set, messages will be sent through the 
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default Call Manager <smex> element. If set to non-empty 
string, it has be to be id of an existing SALT <smex> 
element, which the author has to add to the page. 

5 21.2.2 TransferCall 

The TransferCall control transfers the current call using 
CSTA SingleStepTransf er service. When RunSpeech runs this 
object, it blocks any further speech dialog until transfer 
succeeds or fails. 

10 

class TransferCall : SmexMessageBase 

{ 

string Transf erredTo {get; set}; 

string OnClientFailed {get; set}; 

15 string OnClientTransf erred {get; set}; 

event Transferred; 

} 



21.2.2.1 TransferCall Properties 

20 

Transf erredTo : Required. The device identifier associated 
with the transferred to endpoint . 

Transferred: Optional. Server side event fired when the 
25 call is transferred. The signature of an EventHandler is: 

void EventHandler (object sender, EventArgs e) ; 



30 



where 

sender is the server side TransferCall object, and 
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e is of the standard EventArgs type. 

OnClientTransf erred: Optional. Client side function called 
5 when the call is transferred. OnClientTransf erred returns 
nothing. The signature of OnClientTransf erred is 

function OnClientTransf erred (obj ect sender) 

10 where: 

sender is the client-side TransferCall object. 

OnClientFailed: Client-side function called when CSTA 
returns FAILED event. OnClientFailed returns a bool - true 
15 to continue RunSpeech and false to go to error page. The 
signature for OnClientFailed is: 

function OnClientFailed (object sender, string cause) 
20 where 

sender is the client -side TransferCall object, and 

cause is the reason for failure returned from <smex>. 

21.2.3- MakeCall 

25 The MakeCall control makes an outbound call to the given 
number on the given device when RunSpeech runs this object. 
Further speech dialog is blocked until the call is 
connected or fails to connect. 



30 class MakeCall : SmexMessageBase 

{ 

string CallingDevice {get; set} 
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string 
string 
string 
event 



CalledDirectoryNumber {get ; set } ; 
OnClientFailed {get; set}; 
OnClient Connected {get; set}; 
Connected; 



5 } 



21.2.3.1 MakeCall Properties 

CallingDevice : Required. Default is the internal Calllnfor 
10 Deviceld. The control will use this device to place the 
outbound call . 

CalledDirectoryNumber: Required. Phone number to dial. An 
exception will be thrown if CalledDirectoryNumber is not 
15 specified. 

Connected: Server side event when the call is connected. 
The signature of an EventHandler is: 

20 void EventHandler (obj ect sender, EventArgs e) 

where 

sender is the server side MakeCall object, and 

25 e is of the standard EventArgs type. 

At this point, the CurrentCall property should contain the 
information about the call in progress. 

30 OnClientConnected: Client side function called when the 
call is connected. OnClientConnected returns nothing. The 
signature for OnClientConnected is: 

function OnClientConnected ( object sender, string 
35 CalledDirectoryNumber ) 



where : 



sender is the client-side MakeCall object, and 
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CalledDirectoryNumber is the property of the MakeCall 
object . 

5 OnClientFailed: Client side function called when CSTA 
returns FAILED event. OnClientFailed returns a bool - true 
to continue RunSpeech and false to goto error page. The 
signature for OnClientf ailed is: 



10 function OnClientFailed ( object sender, string cause) 



where 

sender is the client- side MakeCall object, and 
15 cause is the reason for failure returned from <smex>. 
21.2.4 DisconnectCa.ll 

class DisconnectCall : SmexMessageBase 

{ 

20 string OnClientFailed {get; set}; 

string OnClientDisconnected {get; set}; 
event Di sconnect ed ; 

} 

21.2.4.1 DisconnectCall Properties 

25 

Disconnected: Optional. Server side event when the call is 
disconnected. The signature of an EventHandler is: 
void EventHander (object sender, EventArgs e) 

30 where : 

sender is the server side DisconnectCall object and, 

e is of the standard EventArgs type. 

35 OnClientDisconnected: Optional. Client side function called 
when the call is disconnected. OnClientDisconnected returns 
nothing. The signature for OnClientDisconnected is: 
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function OnClientDisconnected (obj ect sender) 

where sender is the client-side Disconnect Call object. 

5 OnClientFailed: Optional. Client side function called when 
CSTA returns FAILED event. OnClientFailed returns a bool - 
true to continue RunSpeech and false to goto error page. 
The signature for OnClientFailed is: 

10 function OnClientFailed (obj ect sender, string cause) 
where 

sender is the client-side Disconnect Call object, and 
15 cause is the reason for failure returned from <smex>. 

21.2.5 AnswerCall 

The AnswerCall control answers incoming calls on the given 
device. When activated, this object will block RunSpeech 
20 until an incoming call is answered. 

Server- side class: 

class AnswerCall : SmexMessageBase 
25 { 

string OnClientConnected {get; set}; 
string OnClientFailed {get; set}; 
event Connected; 

} 

30 

21.2.5.1 AnswerCall Properties 

Connected: Optional. Server side event when the call is 
connected. The signature of a ConnectedEventHandler is: 

35 

void EventHandler (obj ect sender, EventArgs e) 
where : 
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sender is the server side AnswerCall object and 

e is of the standard EventArgs type. 

5 At this point, the CurrentCall property should contain 
information of the call in progress. 

OnClientConnected: Optional. Client side function called 
when the call is connected. OnClientConnected returns 
10 nothing. The signature for OnClientConected is: 

function OnClientConnected! object sender, string callid, 
string CallingDevice, string CalledDevice) 

15 where: 

sender is the client side AnswerCall object 

callid is the id of the current call 

20 CallingDevice is the caller's network device id 

CalledDevice is the recipient's network device id. 

OnClientFailed: Optional. Client side function called when 
25 CSTA returns FAILED event. OnClientFailed returns a bool - 
true to continue RunSpeech and false to go to error page. 
The signature of OnClientFailed is: 

function OnClientFailed (object sender, string cause) 

30 

where 

sender is the client-side AnswerCall object. 
35 cause is the reason for failure returned from <smex>. 
22 RunSpeech 

22.1 Dialog Processing Algorithm 
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The RunSpeech algorithm is used to drive dialog flow on a 
voice-only client. This involves system prompting and 
dialog management and processing of speech input. It is 
specified as a script file referenced by URI from every 
5 relevant speech-enabled page (equivalent to inline embedded 
script) . 

Important: the RunSpeech script will be completely exposed 
to the public. Since it will be hosted on the application 
10 web site, authors of dialogs will be at liberty to examine, 
edit, replace or ignore the RunSpeech script code. 

Rendering of the page for voice only browsers is done in 
the following manner: 
15 The RunSpeech function works as follows (RunSpeech is 
called in response to document . onreadystate becoming 
"complete" ) : 

Controls considered for activation are the QA, 
CompareValidator and CustomValidator controls. 

1. Find the first active QA or Validator control in 
speech index order (determining whether a QA/Validator 
is active is explained below) . 

2. If there is no active control, submit the page. 

3. Otherwise, run the control. 

A QA is considered active if and only if: 

l.The QA' s clientActivationFunction either is not 
present or returns true, AND 
30 2. If the Answers collection is non empty, the State of 

at least one of the Semanticltems pointed to by the 
set of Answers is Empty OR 



20 



25 
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3. If the Answers collection is empty, the State at least 
one Semanticltem in the Confirm array is 
NeedsConf irmat ion . 

However, if the QA has PlayOnce true and its Prompt has 
been run successfully (reached OnComplete) the QA will not 
be a candidate for activation. 

A QA is run as follows: 

1. If this is a different control than the previous 

active control, reset the prompt Count value. 
2 . Increment the Prompt count value 

3. If PromptSelectFunction is specified, call the 
function and set the Prompt's inlinePrompt to the 
returned string. 

4. If a Reco object is present, start it. This Reco 
should already include any active command grammar. 

5. Start the DMTF object if present. (Same concerns apply 
with regard to command Dtmf grammars) . 

A Validator (either a CompareValidator or a 
CustomValidator) is active if: 

1 . The SemanticItemToValidate has not been validated by 
this validator. 

A CompareValidator is run as follows: 

1. Compare the values of the ElementToCompare or 
ValueToCompare and SemanticItemToValidate ToValidate 
according to the validator's Operator. 

2. If the test returns false, empty the text field of the 
SemanticItemToValidate (or both if the InvalidateBoth 
flag is set) and play the prompt. 
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3. If the test returns true, mark the 

Semant ic I temTo Validate as validated by this validator. 

A CustomValidator is run as follows: 
5 1 . The ClientValidationFunction is called with the value 

of the SemanticItemToValidate . 
2. If the function returns false, the semanticltem 
cleared and the prompt is played, otherwise as 
validated by this validator. 

10 

A Command is considered active if and only if: 

1. It is in Scope, AND 

2. There is not another Command of the same Type lower in 
15 the scope tree. 

22.2 LastCommandOrException 

Last CommandOrExcept ion is a global variable and its value 
is passed to several author-defined functions as a 
20 parameter. 

LastCommandOrException is a global variable maintained by 
RunSpeech. The value is set to the last Command. Type or 
recognition exception that occurred. The value will be 

25 reset to w " when there is a QA transition (the current 
active QA is different than the previously active QA, or is 
the first active QA) . There is one exception to this rule: 
If the QA is in a Short time-out confirmation state, and 
the current recognition result is "Silence" , the 

30 LastCommandOrException will be set to (silence in Short 

time-out confirmation is not an exception, but a valid 
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input . ) 



In this fashion, ClientActivationFunction will always get 
the LastCommandOrException that occurred anywhere in the 
5 page, but the rest of the functions of the active QA will 
only get a non-empty LastCommandOrException if they have 
been activated more than once in a row. 



If, after processing all the Answers, ExtraAnswers and 
10 Confirms in a QA, nothing is matched (either due to a 
mismatch in the sml returned or to a high reject 
threshold) , the LastCommandOrException will be set to 
"NoReco" . 

Active Validators will also reset the global 
15 Last Commando r Exc ep t i on . 



Possible values of LastCommandOrException are: 



20 



25 



platform event 



LastCommandOrException 



Prompt fires an onerror event 
Reco fires an onerror event 
Dtmf fires an onerror event 
Reco fires an onnoreco event 
Reco fires a silence event 
Command is Activated 
Transition to new QA 



"Prompt Error" . 
"RecoError" . 
"Dtmf Error" . 
"NoReco" . 
"Silence" . 
Command . type 



Also, a PromptSelectFunction' s LastCommandOrException will 
have the value "ShortTimeoutConf irmation" when its QA is in 
30 Short Time-out Confirmation mode (i.e., when count == 1, 
f irstlnitialTimeout is non-zero, etc.) 



22.3 Count 

Count is exclusively local 



both in 
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ClientActivationFunction and the rest of the functions 
which are passed count. That is, these functions are 
always passed the count of their own QA. To avoid 
confusion, the function ClientActivationFunction will 
5 receive the value that the PromptSelectFunction would 
receive if this QA was active. 

22.4 Postback Support 

In their simplest form, ASP.NET pages are stateless. They 
are instantiated, executed, rendered, and disposed of on 

10 every round trip to the server. In the visual world, 
ASP.NET provides the ViewState mechanism to keep track of 
server control state values that don't otherwise postback 
as part of an HTTP form. The ASP.NET framework uses 
ViewState to manage and restore page properties prior to 

15 and after postback. 

For voice-only pages, the ASP.NET ViewState mechanism is 
not available to the web developer. However, a similar 
mechism is provided by RunSpeech. RunSpeech maintains an 
20 object that can be used to store values which authors wish 
to be persisted across postbacks. The syntax is: 

RunSpeech. ClientViewSt ate ["MyVariableName" ] 
myVariableValue ; 

25 

Any JScript built-in type can be persisted - string, 
number, boolean, array, object, Date, RegExp, or function. 
The main difference between the ASP.NET ViewState (for 
visual pages) and the voice-only ClientViewState mechanism 
30 is that authors of voice-only pages must manually declare 
and set values they wish to maintain across postbacks. 
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If AutoPostBack is set to true in any speech control, the 
matching client-side function will always be executed 
before posting back to the server. If the author wishes to 
5 persist any page state across postback, these client-side 
functions are a good place to invoke the ClientViewState 
object of RunSpeech. 



23 Confirmation Algorithm 

10 

Semantic processing algorithm: 

There are three stages for semantic processing: 



15 1) Preprocessing, carried out when a QA is active: 

This stage is responsible for creating the array of answers 
to be considered in this iteration. This includes all the 
Answers and the Confirms that need confirmation. 
Internally, it creates a structure as follows. 

20 



Answer ID 


CurrentVal 
ue 


Answer ID 


CurrentVal 
ue 



25 This information that is also passed to the 
PromptSelectFunction, GrammarSelectFunction, etc. . 



2) Answer Processing 

In this stage, we process the Answer objects in the Answers 
30 and ExtraAnswers collections. If any item from the Answers 



10 
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collection is matched, a flag indicating this fact is set. 
Answer processing sets the confirmation status of the 
associated semantic item - this status can be either 
NE ED S_CONF I RMAT I ON or CONFIRMED. If the confidence value 
associated with the smlNode specified by the Answer's 
XpathTrigger is less than or equal to the Answer's 
conf irmationThreshold, the status of the semantic item is 
set to NEEDS_CONFIRMATION . Otherwise it is set to 

CONFIRMED. 



3) Confirmation Processing: 

a) Examine at the sml document and search for 
XpathAcceptConf irms and XpathDenyConf irms . Set a global 
confirmation state to NEUTRAL (none was present) , ACCEPT 
15 (xpathAcceptConf irms was present) or DENY 

(XPathDenyConf irms was present) . In short- timeout 
confirmation, silence sets the confirmation state to 
ACCEPT . 

20 b) For all items to be confirmed, 

If there is a value in the sml document that 
matches the XpathTrigger of the confirm item 

* If the new value is the same as 
the value to be confirmed, the item is confirmed 
25 * Else, the item is set to the new 

value, and processed as an answer. 

c) If no Answer object is matched from the Answers or 
Confirms collections, 
30 If the confirmation state is CONFIRM 

Upgrade all items that need confirmation to 
confirmed . 
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If the confirmation state is DENY 

Clear (empty) all items that need confirmation. 

Else, 

Mark all unmatched items that needed confirmation as 
confirmed. 

24 Exceptions 



The following table lists the exceptions thrown by Speech 
Controls during render time. 



Control/obje 
ct 


Attribute/Method 
/Object 


Condition 


Exception 


QA 


Speechlndex 


Speechlndex <0 


Ar gume ntOutOfR 
angeException 












XpathDenyConf irm 
s 


XpathDenyConf i rms 
not specified if 
Confirm specified 


Argument Nul lEx 
ception 




Answers 


Answers collection 
contains a non- 
answer object 


Ar gume n t Exc ep t 
ion 




Prompt 


Prompt non- 
existant in Voice- 
only mode QA 


Argument Nul 1 Ex 
ception 




First InitialTime 
out 


First InitialTimeou 
t specified 
without Confirms 
being specified 


InvalidOperati 
onException 




First InitialTime 
out 


First InitialTimeou 
t < 0 


ArgumentOutOfR 
angeException 




AcceptRe j ectThre 
shold 


AcceptRe j ectThresh 
old <0 or >1 


ArgumentOutOfR 
angeExcept ion 




DenyRe j ectThresh 
old 


DenyRe j ectThreshol 
d <0 or >1 


ArgumentOutOfR 
angeExcept ion 










Command 


Speechlndex 


Speechlndex <0 


ArgumentOutOfR 
angeException 




Scope 


Scope not valid 


Argument Except 
ion 




Scope 


Scope not 
specified 


Ar gumen t Nu 1 1 Ex 
ception 




Type 


Type not specified 


Ar gume n t Nu 1 1 E x 
ception 
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Type/Scope 


More than 1 
Command of same 
Type has same 
Scope 


InvalidOperati 
onException 




AcceptCommandThr 
eshold 


AcceptCommandThres 
hold <0 or >1 


ArgumentOutOfR 
angeException 




XpathTriqqer 


XpathTrigger not 
specified 


ArgumentNul lEx 
ception 




AutoPostBack 


AutoPostBack is 
true and Triggered 
handler not 
specified 


Inval idOperat i 
onException 




AutoPostBack 


AutoPostBack is 
false and 
Triggered handler 
is specified 


InvalidOperati 
onException 










CompareValid 
ator 


Speechlndex 


Speechlndex <0 


Argument Ou 1 0 f R 
angeException 




Semant i c 1 1 emToCo 
mpare 


one of 

Semant icItemToComp 
are and 

ValueToCompare is 
not specified 


InvalidOperati 
onException 




ValueToCompare 


one of 

Semantic I temToComp 
are and 

ValueToCompare is 
not specified 


InvalidOperati 
onException 




ValueToCompare 


ValueToCompare can 
not be converted 
to a valid Type. 


InvalidOperati 
onException 




S emant i c 1 1 emToVa 
lidate 


S emant i c 1 1 emTo Va 1 i 
date not specified 


ArgumentNul 1 Ex 
ception 










CustomValida 
tor 


Speechlndex 


Speechlndex <0 


ArgumentOutOfR 
angeException 




S emant i c 1 1 emToVa 
lidate 


Semant i c 1 1 emToVal i 
date not specified 


ArgumentNul 1 Ex 
ception 




Client Validation 
Function 


ClientValidationFu 
net ion not 
specified 


ArgumentNul 1 Ex 
ception 










Answer 
object 


XpathTrigger 


XpathTrigger not 
specified for 
Answers or 


ArgumentNul lEx 
ception 
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ExtraAnswwers 






Conf irmThreshold 


Conf irmThreshold 
<0 or >1 


Argument OutOfR 
angeException 




Reject 


Reject <0 or >1 


Argument OutOfR 
angeExcept ion 




AutoPostBack 


Answer . Triggered 
has a handler but 
Answer . AutoPostBac 
k is false 


I nval idOperat i 
onException 










Semantic I tern 
obj ect 


TargetElement 


TargetElement 
specifies multiple 
ids 






Target At tribute 


TargetAttribute is 
not specified when 
TargetElement is 
specified 


AroumentNul 1 Ex 
ception 




BindAt 


BindAt set to an 
invalid value 


ArgumentExcept 
ion 




BindAt 


BindAt is w sprvpr" 
and 

Semant i c 1 1 em . Targe 
tElement is not a 
server-side 
control 


Araumpn t Ryrpnt 
ion 




BindAt 


BindAt is "server" 
and 

Semantic I tern. Targe 
t At t ri bn t i q not 
a member of the 
control specified 
by 

Semanticltem. Targe 
tElement 


ArgumentExcept 
ion 




BindAt 


BindAt is "server" 
and 

Semanticltem. Targe 
tAttribute is a 
member of 

Semanticltem. Targe 
tElement, but is 
not of type 
string, 


Argume n t Except 
ion 




BindAt 


BindAt is "server" 
and 

Semanticltem. Targe 


ArgumentExcept 
ion 
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tAttribute is a 
string, but is 
read-only. 




















Reco object 


initialTimeout 


initialTimeout 
negative 


ArgumentOutOf R 
angeException 




babbleTimeout 


babbleTimeout 
negative 


ArgumentOutOf R 
angeException 




maxTimeout 


maxTimeout 
negative 


ArgumentOutOf R 
angeExcept ion 




endSilence 


endSilence 
negative 


ArgumentOutOf R 
angeException 




rei ect 


reject <o or >1 


ArgumentOutOf R 
angeExcept ion 




Grammars 


Grammars 
collection 
contains a non- 
grammar object 


Argument Except 
ion 




Params 


name not specified 


Argument Nul 1 Ex 
ception 




Params 


contains a non- 
param object 


Arqument Except 
ion 










Grammar 
object 


src/lnlineGramma 
r 


one of src or 
InlineGrammar is 
not specified 


ArgumentNullEx 
ception 










Prompt 
object 


Params 


name not specified 


Argume n t Nu 1 1 Ex 
ception 




Params 


contains a non- 
param object 


ArgumentExcept 
ion 










Dtmf object 










initialTimeout 


initialTimeout <0 


ArgumentOutOf R 
angeExcept ion 




interdigitTimeou 
t 


interdigitTimeout 
<0 


Ar gume n t Ou 1 0 f R 
angeException 




endSilence 


endSilence <0 


Argume ntOutOfR 
angeExcept ion 




Params 


name not specified 


ArgumentNullEx 
ception 




Params 


contains a non- 
param object 


ArgumentExcept 
ion 










\ 
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Stylesheet 




contains an object 
which is not a 
Style object 


Ar gume n t Exc ep t 
ion 










Style object 


StyleRef erence 


StyleRef erence is 
invalid 


Argument Except 
ion 


















SmexMessageB 
ase 


Timer 


Timer <0 


ArgumentOutOfR 
angeException 


















SmexMessage 


Message 


Message not 
specified 


ArgumentNullEx 
ception 


















MakeCall 


CalledDirectoryN 
umber 


CalledDirectoryNum 
ber not specified 


Ar gume n t Nu 1 1 Ex 
ception 











2 6 Terms and Defintions 



Term 



Definition 



Voice-only 



A mode of dialog that utilizes only speech input 
and ouput. There are no visual elements presented 
to the end user. Voice-only dialog typically 
implies the end user communication via the 
telephone. However, voice-only interaction may 
occur in a desktop computer setting. 



Multimodal 



A mode of dialog that utilizes speech input and visual 
ouput. Multimodal typically implies end user 
communication with a dialog via a hand-held computing 
device such as a pocket PC. ^ 



Tap-and- talk 



A form of dialog interaction that utilizes speech 
input and visual ouput. This form of dialog 
interaction typically occurs on a hand-held computer 
such a pocket PC. The end user selects ("taps") the 
visual element with a stylus or pen- like device and 
provides input to the visual element using speech 
("talk") . 



Mixed 



A form of dialog interaction model, whereby the user 
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Initiative 


is permitted to share the dialog initiative with the 
system, eg by providing more answers than requested by 
a prompt, or by switching task when not prompted to do 
so . 


SAPI SML 


SAPI Semantic markup language. The XML document 
returned by SAPI 6.0 when an utterance is determined 
to be in-grammar. (SAPI SML is a SAPI -specif ic return 
format. Speech tags interpreters are agnostic to the 
actual content format of the returned document, 

htovi HpH "it" "i o-ri YMTi flnrnimpnt ) ^APT cjMTi rnnfainci 

J- \J V X V_4.^V_A JL L. _L O all ZVlU-LJ UWL UUICIIL / . untl OllJ-J L U 1 HO 

semantic values, confidence scores and the words used 
by the speaker. (It is generated by script or XSLT 
instructions contained within the grammar rules.) SAPI 
SML is described in greater detail in the Speech Core 
document SML Generation. . 


CSTA 


Computer Supported Telecommunications Applications - 
an ECMA standard. From the ECMA document: n CSTA is 

Pi Ti i ni-p-rf app t~H;=lt~ T>Y~Ovi HP'C? apppqq fn hpl ppnmmi mi CPi 1~ "i on 

U.11 Xll L. CJ. J_ Q^C L11CI L. Ul U V JLUCD aL>UCOO L. V-^ LCi. CV^UULlUUll-LUQ L ±VJ11 

functions that may be used with your phone (or many 
other communication devices) and may also be used by 
3rd party applications such as Contact/Call Centres 
(e.g. ACD systems) . " 

http://www.ecma.ch/ecmal/TOPICS/TC32/TGll/CSTA.HTM 


System 
Initiative 


A form of dialog interaction model, whereby the system 
holds the initiative, and drives the dialog with 
typically simple questions to which only a single 
answer is possible. 


XPath 


XML Path language, a W3C recommendation for addressing 
parts of an XML document. See 
http : / /www. w3 . org/TR/xpath . 



27 Platform parameter settings 



The <param> mechanism (described in sections Error! 
Reference source not found. Prompt object contents, Error! 
Reference source not found. Reco object contents and Error! 
Reference source not found. Dtmf object contents) 31 is used 
to configure platform settings. The following "params" are 
recognized by all Microsoft platforms: 



Object 


Name 


Value 


Default 


Description 


Prompt 


server 


URI 

describing 
the location 
of the speech 


http : //localh 
ost (client) 
and registry 
setting 


This configuration 
setting selects the 
speech server used for 
speech processing 
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server 



(telephony- 
server) 



bargein 
type 



This 

attribute 
sets the type 
of 

recognition 
input event 
that the 
browser uses 
to determine 
whether an 
onbargein 
event should 
be fired. 
There are 
three types 
of 

bargeintype 
that can be 
set : 

"speech" , 
"grammar" and 
"final" . 



The default 
setting is 
"speech" . If 
the platform 
does not 
support the 
type 

selected, the 
browser 
defaults to 
"speech" . 



The barge -in types are 
defined as: speech: 
This represents 
speech/ sound/energy 
("SOUND_START") 
detected by the 
recognition engine, 
grammar: This 
represents the audio 
partially matching the 
recognition grammar. 
The speech server will 
generate a 

" PHRAS E_S TART " event, 
and possibly a 
semantic event (a 
semantic property in 
the phrase hypothesis 
has confidence greater 
than the confidence 
threshold) . The client 
decides when to throw 
"onbargein" based on 
the capabilities sent 
by the speech server 
when a session is 
opened. The confidence 
threshold used by the 
semantic event is a 
client platform 
setting, final: This 
represents using a 
"valid" final 
recognition result 
(i.e. a result where 
the utterance 
confidence level is 
above the "reject" 
threshold) . Run in 
conjunction with 
multiple recognition 
mode, this represents 
the recognizer 
continuously listening 
for a valid result, 



253 











for hotword/wake-up 
style scenarios. Note 
that in this case the 
browser must fire 
onbargein before 
firing onreco . 


Reco 


server 


URI 

describing 
the location 
of the speech 
server 


http : //localh 
ost (client) 
and registry 
setting 
(telephony- 
server) 


This configuration 
setting selects the 
speech server used for 
speech processing 













5 28 DET Descriptions 

The following table lists brief descriptions for each 
control, object and attribute. These descriptions will be 
used by the DET tool and exposed to the dialog author using 
Visual Studio. 



Control/obj 
ect 


Attribute/Method/Ob j 
ect 


Brief description 


QA 


Id 


Programmatic name of the 
control 




Speechlndex 


Activation order of the control 




CI ientAct ivat ionFunc 
tion 


Client-side function used to 
determine whether or not to 
activate the QA control 




OnCl ientAct ive 


Client-side function called 
after QA is determined to be 
active 




OnClient Complete 


Client-side function called 
after execution of QA 
(successfully or not) . 




OnCl ient Listening 


Client-side function called 
after successful start of the 
reco object 




AllowCommands 


Whether or not Commands may be 
activated for this QA 




PlayOnce 


Whether or not this QA may be 
activated more than once per 
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page 




XpathAcceptConf irms 


The path in the sml document 
that indicates the confirm 
items were accepted 




Xpa t hDenyConf i rms 


The path in the sml document ' 
that indicates the confirm 
items were denied 




First InitialTimeout 


Specifies initial timeout when 
QA. Count ==1 . 




Answers 


An array of answer objects 




ExtraAnswers 


An array of answer objects 




Confirms 


An array of answer objects. 




Prompt 


The Prompt object for this QA 




Reco 


The Reco object for this QA 




Dtmf 


The Dtmf object for this QA 








Command 


Id 


Programmatic name of the 
control 




Speechlndex 


Activation order of the control 




Scope 


The id of ASP.NET control that 
activates this Command grammar 




Type 


The type of this Command in 
order to allow the overriding 
of identically typed commands 




XpathTrigger 


SML document path that triggers 
this command 




Accept CommandThresho 
Id 


Confidence level of recognition 
that is necessary to trigger 
this command 




OnClient Command 


Function to execute on 
recognition of this Command's 
grammar 




AutoPostBack 


Whether or not Command control 
posts back to server when 
Command grammar is recognized. 




Prompt 


A Prompt object 




Grammar 


The grammar object which will 
listen for the command 




Dtmf 


The Dtmf object which will 
activate the command 








CompareVali 
dator 


Id 


Programmatic name of the 
control 




Speechlndex 


Activation order of the control 




Type 


Sets the datatype of the 
comparison 
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ElementToCompare 


The JScript variable or Id of 
the Semanticltem used as the 
basis for the comparison 




Semant ic 1 1 emToVal ida 
te 


The Id of the control that is 
being validated 




Semant ic I temToCompar 
e 


The Id of the control that is 
the basis for comparison 




Operator 


Validation operator 




InvalidateBoth 


Whether or not to invalidate 
both ElementToCompare and 
ElementToValidate 




Prompt 


Prompt to indicate the error 


CustomValid 
ator 








id 


Programmatic name of the 1 
control 




Speechlndex 


Activation order of the control 




Semant icItemToVal ida 
te 


The Id of the control that is 
being validated 




AttributeToValidate 


Attribute of the 
ElementToValidate that contains 
the value being validated 




Cl i ent Val ida t ionFunc 
tion 


Validation function 




Prompt 


Prompt to indicate the error 


Answer 
object 








id 


Programmatic name of the object 




XpathTrigger 


The part of the SML document 
this answer refers to 




Cl ientNormal i zat ionF 
unction 


Function that returns author- 
specified transformation of the 
recognized item 




Semanticltem 


The semantic item to which this 
answer should be written 




Conf irmThreshold 


The minimum confidence level of 
recognition necessary to mark 
this item as confirmed 




Reject 


Rejection threshold for the 
Answer 




OnClient Answer 


Function to be called when the 
XpathTrigger is matched 




AutoPostBack 


Whether or not to post back to 
the server each time user 
interacts with the control 


Prompt 
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obj ect 








id 


Programmatic name of the object 




type 

J. XT 


Mime -type corresponding to the 
speech output format 




prefetch 


Whether or not the prompt 
should be immediately 
synthesized and cached at 
browser when the page is loaded 




lang 


The language of the prompt 
content 




bargein 


Whether or not the speech 
platform is responsible for 
stopping prompt playback when 
speech or DTMF input is 
detected . 




Prompt Select Function 


Function that selects and/or 
modifies a prompt string prior 
to playback 




OnC 1 i ent Bookmark 


Function which is called when a 
bookmark is reached in the 
prompt text during playback 




OnClientError 


Function called in response to 
an error event in the client 




InLinePrompt 


Text of the prompt 




Params 


Specifies non-standard speech 
platform configuration values 


Reco object 








Id 


Programmatic name of the object 




StartElement 


Name of the GUI element to 
throw the start event 




StartEvent 


Name of the GUI event that 
will activate the underlying 
client-side Reco object 




StopElement 


Name of the GUI element to 
throw the stop event 




StopEvent 


Name of the GUI event that 
will deactivate the underlying 
client-side Reco object 




initialTimeout 


The time in milliseconds 
between start of recognition 
and the detection of speech 




babbleTimeout 


The period of time in 
milliseconds in which the 
recognizer must return a result 
after detection of speech 




maxTimeout 


The period of time in 
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milliseconds between 
recognition start and results 
returned to the browser 




endSilence 


Period of silence in 
milliseconds after the end of 
an utterance which the 
recognition results are 
returned 




Reject 


The rejection threshold below 
which the platform will throw 
the noReco event 




Lang 


The language of the speech 
recognition enqine 




Mode 


Specifies the recognition mode 
to be followed 




Grammar SelectFunct io 
n 


Client-side function called 
prior to starting the 
recognition process 




OnClient Silence 


Client-side function that will 
be called after detecting 
silence 




OnCl ientNoReco 


Client-side function that will 
be called after detecting no 
recognition 




OnClient Error 


Client-side function that will 
be called after recognition 
errors 




OnClientSpeechDetect 
ed 


Client-side function called 
when recognition platform 
detects speech 




Grammars 


An array of grammar objects. 




Params 


Specifies non-standard speech 
platform configuration values 




Record 


Used for recording audio input 
from the user. 


Grammar 








id 


Programmatic name of the object 




type 


Mime -type of the grammar format 
used 




lang 


Language of the grammar 




src 


URI of the grammar to load 




InLineGrammar 


Text of the grammar 


Dtmf object 








id 


Programmatic name of the object 




numDigits 


Number of key presses required 
to end the DTMF collection 
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session 




autof lush 


Whether or not to automatically 
flush the DTMF buffer on the 
underlying telephony interface 
card before activation 




terminalChar 


Terminating key to end the DTMF 
collection session 




initialTimeout 


Number of milliseconds to wait 
between activation and the 
first key press before raising 
a timeout event 




interdigitTimeout 


Number of milliseconds to wait 
between key presses before 
raising a timeout event 




SMLContext 


DTMF results wrapped in SML 
tags 




OnClient Silence 


Function that executes if there 
is no DTMF key press before 
initialTimeout expires 




OnC 1 i ent Key P re s s 


Function that executes on every 
pressing of a legal DTMF key. 




OnCl ient Error 


Function that executes if 
serious or fatal error occurs 
with the DTMF 

collection/recognition process 


Params 


Params 


Specifies non-standard DTMF 
engine configuration values 




name 


The name of the parameter to be 
configured. 


record 


Value 


The value assigned to the named 
parameter 




enabled 


Whether or not to record user 
input . 




type 


MIME type of the file 
containing the recorded audio. 






Whether or not to play a beep 
before recording begins. 



